Sometimes, when you import data from other sources, such as text files, web pages, or databases, you may encounter some non-printable characters or line breaks that can affect the appearance and functionality of your data. For example, you may see some strange symbols, spaces, or tabs that are not visible in the original source, but appear in Excel cells. These non-printable characters can cause problems when you try to perform calculations, sort, filter, or format your data.
Fortunately, Excel has a built-in function called CLEAN that can help you remove these non-printable characters from your text. In this article, we will explain what the CLEAN function does, how to use it, and how to apply it to a real-life scenario.
What is the CLEAN Function?
The CLEAN function is a text function that removes all non-printable characters from a given text string. Non-printable characters are those that are not displayed on the screen or printed on paper, but are used for internal purposes, such as formatting, control, or communication. They are represented by numbers 0 to 31 in the 7-bit ASCII code, which is a standard system for encoding text characters. Some examples of non-printable characters are:
- CHAR(0): Null character
- CHAR(9): Horizontal tab
- CHAR(10): Line feed
- CHAR(13): Carriage return
- CHAR(32): Space
The syntax of the CLEAN function is:
=CLEAN(text)
where text is the text string from which you want to remove non-printable characters. The text can be a cell reference, a text value, or a formula that returns text.
The CLEAN function returns a text string that is free of non-printable characters. If the text argument is empty or contains only non-printable characters, the CLEAN function returns an empty string.
How to Use the CLEAN Function
To use the CLEAN function, you need to enter it in a cell where you want to display the cleaned text. You can also use it as part of a larger formula, such as concatenating, extracting, or replacing text. Here are some examples of how to use the CLEAN function in Excel:
- To remove non-printable characters from cell A1, enter
=CLEAN(A1)
in another cell. - To remove non-printable characters from a text value, enter
=CLEAN("This is a test" & CHAR(10))
in a cell. This will remove the line feed character (CHAR(10)) from the text. - To remove non-printable characters from the result of a formula, enter
=CLEAN(LEFT(B1,10))
in a cell. This will remove any non-printable characters from the first 10 characters of cell B1.
How to Apply the CLEAN Function to a Scenario
To demonstrate how the CLEAN function can be useful in a real-life situation, let us consider the following scenario:
You have imported some data from a web page that contains the names and email addresses of some customers. However, the data also contains some non-printable characters, such as tabs, spaces, and line breaks, that make the data look messy and inconsistent. You want to clean the data and remove these non-printable characters, so that you can use the data for further analysis.
Here is how the imported data looks like in Excel:
Name | |
---|---|
John Smith | john.smith@example.com |
Mary Jones | mary.jones@example.com |
Peter Lee | peter.lee@example.com |
Lisa Brown | lisa.brown@example.com |
Here is how the data looks like after applying the CLEAN function:
Name | |
---|---|
John Smith | john.smith@example.com |
Mary Jones | mary.jones@example.com |
Peter Lee | peter.lee@example.com |
Lisa Brown | lisa.brown@example.com |
As you can see, the data looks much cleaner and uniform after using the CLEAN function. To achieve this result, you need to follow these steps:
- Select the cells that contain the imported data, such as A2:B5 in this example.
- Go to the Data tab and click on Text to Columns in the Data Tools group.
- In the Convert Text to Columns Wizard, choose Delimited as the file type and click Next.
- In the Delimiters section, uncheck all the options and click Next.
- In the Column data format section, choose Text as the data format and click Finish.
- This will split the data into two columns, one for the name and one for the email address. However, the data may still contain some non-printable characters, such as spaces or tabs, that are not visible.
- To remove these non-printable characters, enter
=CLEAN(A2)
in cell C2 and drag it down to fill the rest of the column. This will remove any non-printable characters from the name column. - Do the same for the email column by entering
=CLEAN(B2)
in cell D2 and dragging it down to fill the rest of the column. This will remove any non-printable characters from the email column. - You can now copy and paste the cleaned data to another location, or use it for further analysis.
Other Approaches to Remove Non-printable Characters
The CLEAN function is not the only way to remove non-printable characters from text in Excel. Here are some other approaches that you can try:
- Use the SUBSTITUTE function to replace specific non-printable characters with an empty string. For example, to remove the line feed character (CHAR(10)) from cell A1, enter
=SUBSTITUTE(A1,CHAR(10),"")
in another cell. - Use the TRIM function to remove extra spaces from text. The TRIM function removes all spaces from text except for single spaces between words. For example, to remove extra spaces from cell A1, enter
=TRIM(A1)
in another cell. - Use the Find and Replace feature to find and replace non-printable characters with an empty string. To do this, select the cells that contain the text, press Ctrl+H to open the Find and Replace dialog box, enter the non-printable character in the Find what box, leave the Replace with box blank, and click Replace All.