Cleaning Strange Characters from Text Fields in Excel

Sometimes, when you import data from external sources or copy and paste text from other applications, you may end up with unwanted characters in your Excel cells. These characters can be invisible, such as spaces, tabs, line breaks, or non-printing characters, or visible, such as symbols, punctuation marks, or foreign characters. These characters can cause problems when you try to manipulate, analyze, or format your data. For example, they can prevent you from applying formulas, sorting, filtering, or converting data types.

Fortunately, Excel provides several ways to remove or replace these characters, depending on your needs and preferences. In this article, we will explain the basic theory behind some of the most common methods, and show you how to apply them in practice. We will also give you a scenario to illustrate how these methods work with real data, and compare the advantages and disadvantages of each approach.

Using Find and Replace

One of the simplest and fastest ways to remove specific characters in Excel is to use the Find and Replace feature. This feature allows you to search for a character or a string of characters, and replace it with another character, a blank, or nothing. To use this feature, follow these steps:

  • Select a range of cells where you want to remove a specific character.
  • Press Ctrl + H to open the Find and Replace dialog box.
  • In the Find what box, type the character or the string of characters that you want to remove. For example, if you want to remove the hash symbol (#), type # in the box.
  • Leave the Replace with box empty, or type a character or a string of characters that you want to replace the original one with. For example, if you want to replace the hash symbol (#) with a dash (-), type – in the box.
  • Click Replace All to apply the change to all the selected cells, or click Replace to apply the change to one cell at a time.

Here is an example of how you can use Find and Replace to remove the hash symbol (#) from a range of cells:

 

As you can see, the hash symbol is removed from all of the selected cells at once, and a pop-up dialog informs you how many replacements have been made.

Advantages and disadvantages of Find and Replace

The main advantages of using Find and Replace are:

  • It is easy to use and does not require any formulas or functions.
  • It can remove or replace any character or string of characters, regardless of their position or frequency in the cell.
  • It can handle multiple characters or strings of characters at once, by using the wildcard characters (* and ?) or the OR operator (~).

The main disadvantages of using Find and Replace are:

  • It changes the original data directly, which can be risky if you make a mistake or want to revert to the original values. To avoid this, you can make a backup copy of your data before using Find and Replace, or use Ctrl + Z to undo the change immediately after applying it.
  • It is case-sensitive, which means that it will not find or replace characters that have a different letter case than the one you typed in the Find what box. To overcome this, you can click Options to expand the Find and Replace dialog box, and then tick the Match case box to perform a case-insensitive search.
  • It does not work well with non-printing characters, such as spaces, tabs, line breaks, or non-breaking spaces, because they are not visible in the Find what box. To remove these characters, you need to use their code numbers, which can be tedious and error-prone. For example, to remove a non-breaking space, you need to type CHAR(160) in the Find what box, and make sure that the workbook is set to use the 1252 character set.

Using SUBSTITUTE function

Another way to remove specific characters in Excel is to use the SUBSTITUTE function. This function can find and replace text in a given string by matching. The syntax of the function is:

=SUBSTITUTE(text, old_text, new_text, [instance_num])

Where:

  • text is the cell or the string of text that you want to modify.
  • old_text is the character or the string of characters that you want to remove or replace.
  • new_text is the character or the string of characters that you want to replace the old_text with. If you want to remove the old_text, leave this argument blank or type “”.
  • instance_num is an optional argument that specifies which occurrence of the old_text you want to replace. If you omit this argument, the function will replace all occurrences of the old_text.

To use the SUBSTITUTE function, enter the formula in a cell, and then copy it down or across to apply it to other cells. Here is an example of how you can use the SUBSTITUTE function to remove the hash symbol (#) from a range of cells:

 

As you can see, the formula in C2 is:

=SUBSTITUTE(B2,”#”,””)

This formula tells Excel to take the text in B2, and replace each hash symbol (#) with an empty string (“”), which effectively removes the character. The result is a text string without the hash symbol. The same formula is copied down to C3:C6 to remove the hash symbol from the other cells.

Advantages and disadvantages of SUBSTITUTE function

The main advantages of using the SUBSTITUTE function are:

  • It does not change the original data, but returns a new text string in a different cell, which preserves the original values and allows you to compare the results.
  • It can remove or replace any character or string of characters, regardless of their position or frequency in the cell.
  • It can remove or replace a specific occurrence of a character or string of characters, by using the instance_num argument.

The main disadvantages of using the SUBSTITUTE function are:

  • It requires a formula, which can be cumbersome and complex if you need to remove or replace multiple characters or strings of characters. In that case, you may need to nest several SUBSTITUTE functions together, or use other functions, such as CONCATENATE, LEFT, RIGHT, MID, etc.
  • It always returns a text string, even if the result contains only numbers. This can cause problems if you want to perform calculations or apply formatting to the result. To solve this, you can wrap the SUBSTITUTE function in the VALUE function, which converts a text string to a number, like this:

=VALUE(SUBSTITUTE(B2,”#”,””))

Using CLEAN and TRIM functions

Another way to remove specific characters in Excel is to use the CLEAN and TRIM functions. These functions can remove non-printing characters and extra spaces from a text string, respectively. The syntax of these functions are:

=CLEAN(text) =TRIM(text)

Where:

  • text is the cell or the string of text that you want to modify.

The CLEAN function can remove any non-printing character that has a code value from 0 to 31, such as spaces, tabs, line breaks, or non-breaking spaces. The TRIM function can remove any leading, trailing, or repeated spaces from a text string, leaving only single spaces between words.

To use these functions, enter the formula in a cell, and then copy it down or across to apply it to other cells. You can also combine these functions in a single formula, like this:

=TRIM(CLEAN(text))

This formula will first remove any non-printing characters using the CLEAN function, and then remove any extra spaces using the TRIM function. Here is an example of how you can use the CLEAN and TRIM functions to remove non-printing characters and extra spaces from a range of cells:

![CLEAN and TRIM functions example]

As you can see, the formula in C2 is:

=TRIM(CLEAN(B2))

This formula tells Excel to take the text in B2, and remove any non-printing characters using the CLEAN function, and then remove any extra spaces using the TRIM function. The result is a text string without any non-printing characters or extra spaces. The same formula is copied down to C3:C6 to clean the other cells.

Advantages and disadvantages of CLEAN and TRIM functions

The main advantages of using the CLEAN and TRIM functions are:

  • They do not change the original data, but return a new text string in a different cell, which preserves the original values and allows you to compare the results.
  • They can remove most of the common non-printing characters and extra spaces that can cause problems in Excel, without requiring you to know their code numbers or use the Find and Replace feature.
  • They can be easily combined in a single formula, or with other functions, such as SUBSTITUTE, to perform more complex text cleaning operations.

The main disadvantages of using the CLEAN and TRIM functions are:

  • They cannot remove or replace any visible characters, such as symbols, punctuation marks, or foreign characters. For that, you need to use the Find and Replace feature or the SUBSTITUTE function.
  • They always return a text string, even if the result contains only numbers. This can cause problems if you want to perform calculations or apply formatting to the result. To solve this, you can wrap the CLEAN and TRIM functions in the VALUE function, which converts a text string to a number, like this:

=VALUE(TRIM(CLEAN(B2)))

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *