A Guide to Excel Formulas for Determining the Most Common 3 Digits in a Numeric Data Set

Sometimes, you may have a column of numeric data and need to determine the most common three digits that appear in positions 3-5 in any order. For example, given the following data:

Table

Data
99123456
98321889
99456777
98231666
99221457
98997000

You may want to find out that the most common digits are 2, 1, and 3.

In this article, we will show you how to use Excel formulas to solve this problem. We will also explain the basic theory behind the formulas and provide a detailed example with real numbers.

To find the most common 3 digits in position 3-5 of a numeric string, we need to do the following steps:

  • Extract the digits at positions 3, 4, and 5 from each number using the MID function.
  • Convert the extracted digits into a single column array using the TOCOL function.
  • Compare each digit with its equivalent horizontal array using the TOROW function and the equal sign (=).
  • Convert the comparison results into numbers using the N function, where TRUE becomes 1 and FALSE becomes 0.
  • Perform a matrix multiplication of the two arrays using the MMULT function to get the sum of each digit’s occurrence.
  • Horizontally stack the original array and the sum array using the HSTACK function.
  • Remove the duplicate rows and sort the array by the sum column in descending order using the UNIQUE and SORT functions.
  • Take the first 3 rows of the sorted array and subtract 1 from each digit using the TAKE and MINUS functions.

Procedures

To implement the above steps in Excel, we can use the following formula:

=LET (
  α, TOCOL (MID (A2:A7, {3,4,5},1)+1),
  TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) - 1
)

This formula uses the LET function to define a variable α, which represents the single column array of the extracted digits. The rest of the formula follows the steps described in the basic theory section.

Example

To illustrate how the formula works, let’s use the data from the introduction section and enter the formula in cell B2. Then, we can copy the formula to the below cells to get the top 3 digits.

Table

Data Formula Result
99123456 =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) 2
98321889 =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) 1
99456777 =LET (α, TOCOL (MID (A2:A7, {3,4,5},1)+1), TAKE (SORT (UNIQUE (HSTACK (α, MMULT (N (α=TOROW (α)), α^0))), 2, -1), 3, 1) – 1) 3
98231666
99221457
98997000

The formula returns 2, 1, and 3 as the most common digits in position 3-5 of the numeric string, as expected.

Other approaches

There are other ways to find the most common 3 digits in position 3-5 of a numeric string in Excel. For example, you can use the following formula, which is based on the FREQUENCY and MATCH functions:

=LET (
  data, $A$2:$A$7,
  s_10, SEQUENCE (10,,0),
  s_9, SEQUENCE (9,,0),
  fr, FREQUENCY (--MID (data, {3,4,5},1), s_9) - s_10/100,
  MATCH (LARGE (fr, {1;2;3}), fr, 0) - 1
)

This formula works by counting the frequency of each digit in the range 0-9 using the FREQUENCY function, and then subtracting a small fraction from each frequency to avoid ties. Then, it uses the LARGE function to find the top 3 frequencies and the MATCH function to return the corresponding digits.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *