One hot encoding is a technique to transform categorical data into numeric data. It creates as many columns as there are unique values in a variable, and assigns a 1 or 0 to indicate the presence or absence of that value in each row. For example, if you have a column with three possible values: A, B, and C, you can create three new columns: A, B, and C, and fill them with 1 or 0 depending on the original value. This way, you can use the numeric data for machine learning or other purposes.
To convert a single column with a list of values into one hot encoding using Power Query in Excel, you can follow these steps:
- Select the column that you want to encode, and go to Data > From Table/Range. This will open the Power Query Editor.
- In the Power Query Editor, select the column, and go to Transform > Split Column > By Delimiter. Choose the delimiter that separates the values in your list, such as comma, semicolon, or space. Check the option to split into rows, and click OK. This will create multiple rows for each list of values.
- Add a custom column with the value 1. Go to Add Column > Custom Column, and enter 1 as the formula. Name the column as Value, and click OK.
- Pivot the column that contains the values. Select the column, and go to Transform > Pivot Column. Choose Value as the values column, and Sum as the aggregate value function. Click OK. This will create new columns for each unique value, and fill them with 1 or 0 depending on the presence of that value in the original list.
- Optionally, you can merge the rows that have the same ID. Select the ID column, and go to Home > Group By. Choose All Rows as the operation, and click OK. This will create a new column with a table for each ID. Expand the table column, and select the columns that you want to keep. Click OK. This will merge the rows that have the same ID, and show the one hot encoded values for each ID.
- Close and load the query to a worksheet. Go to Home > Close & Load, and choose where you want to load the query. This will create a new table with the one hot encoded data.
Here is an example scenario to illustrate the process:
Suppose you have a table with two columns: ID and Segments. The Segments column contains a list of values separated by semicolons, such as Food;Automation;Mechatronics. You want to convert this column into one hot encoding using Power Query in Excel.
The table looks like this:
Table
ID | Segments |
---|---|
1 | Food |
2 | Automation |
3 | Mechatronics |
4 | Automation;Mechatronics |
After following the steps above, the table will look like this:
Table
ID | Food | Automation | Mechatronics |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 0 | 1 |
4 | 0 | 1 | 1 |
This is the result of the scenario.
Another approach to convert a single column with a list of values into one hot encoding is to use formulas in Excel. You can use the INDEX, COLUMN, and IF functions to create the new columns and fill them with 1 or 0. Here are the formulas you can use:
- In cell E2, enter
=INDEX($B:$B,COLUMN(H:H)-COLUMN($E:$E))
and drag it across as needed. This will create the column headers for each unique value. - In cell D3, enter
=A3
and drag it down as needed. This will copy the ID column. - In cell E3, enter
=IF($B3=E$2,1,0)
and drag it across and down as needed. This will fill the cells with 1 or 0 depending on the value in the Segments column.
Using the same example scenario as before, the table will look like this after applying the formulas:
Table
ID | Food | Automation | Mechatronics |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 0 | 1 |
4 | 0 | 1 | 1 |
This is another way to get the same result as the Power Query method.