Sometimes, you may have a column of numeric data and need to determine the most common three digits that appear in positions 3-5 in any order. For example, given the following data:
Data |
---|
99123456 |
98321889 |
99456777 |
98231666 |
99221457 |
98997000 |
You may want to find out that the most common digits are 2, 1, and 3.
In this article, we will show you how to use Excel formulas to solve this problem. We will also explain the basic theory behind the formulas and provide a detailed example with real numbers.
To find the most common 3 digits in position 3-5 of a numeric string, we need to do the following steps:
- Extract the digits at positions 3, 4, and 5 from each number using the MID function.
- Convert the extracted digits into a single column array using the TOCOL function.
- Compare each digit with its equivalent horizontal array using the TOROW function and the equal sign (=).
- Convert the comparison results into numbers using the N function, where TRUE becomes 1 and FALSE becomes 0.
- Perform a matrix multiplication of the two arrays using the MMULT function to get the sum of each digit’s occurrence.
- Horizontally stack the original array and the sum array using the HSTACK function.
- Remove the duplicate rows and sort the array by the sum column in descending order using the UNIQUE and SORT functions.
- Take the first 3 rows of the sorted array and subtract 1 from each digit using the TAKE and MINUS functions.
Procedures
To implement the above steps in Excel, we can use the following formula:
=LET (
α, TOCOL (MID (A2:A7, {3,4,5},1)+1),
TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) - 1
)
This formula uses the LET function to define a variable α, which represents the single column array of the extracted digits. The rest of the formula follows the steps described in the basic theory section.
Example
To illustrate how the formula works, let’s use the data from the introduction section and enter the formula in cell B2. Then, we can copy the formula to the below cells to get the top 3 digits.
Data | Formula | Result |
---|---|---|
99123456 | =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) | 2 |
98321889 | =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) | 1 |
99456777 | =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) | 3 |
98231666 | ||
99221457 | ||
98997000 |
The formula returns 2, 1, and 3 as the most common digits in position 3-5 of the numeric string, as expected.
Other approaches
There are other ways to find the most common 3 digits in position 3-5 of a numeric string in Excel. For example, you can use the following formula, which is based on the FREQUENCY and MATCH functions:
=LET (
data, $A$2:$A$7,
s_10, SEQUENCE (10,,0),
s_9, SEQUENCE (9,,0),
fr, FREQUENCY (--MID (data, {3,4,5},1), s_9) - s_10/100,
MATCH (LARGE (fr, {1;2;3}), fr, 0) - 1
)
This formula works by counting the frequency of each digit in the range 0-9 using the FREQUENCY function, and then subtracting a small fraction from each frequency to avoid ties. Then, it uses the LARGE function to find the top 3 frequencies and the MATCH function to return the corresponding digits.