Openpyxl is a Python library that allows you to read and write Excel files. One of the features of openpyxl is that it supports formulae, including array formulae. Array formulae are special formulae that perform calculations on multiple values or ranges of cells and return one or more results. They are useful for performing complex calculations or operations that cannot be done with regular formulae.
In this article, we will explain the basic theory of array formulae, how to insert them using openpyxl, and how to use them in a practical scenario. We will also compare other approaches that can achieve similar results.
Basic theory of array formulae
An array formula is a formula that operates on an array of values or ranges of cells, rather than a single value or cell. An array is a collection of values or cells that are arranged in rows and columns. For example, the range A1:B5 is an array of 10 values, arranged in 2 columns and 5 rows.
An array formula can perform calculations on the entire array, or on parts of the array, depending on the function used. For example, the function SUM can add up all the values in an array, while the function MAX can return the largest value in an array.
An array formula can also return an array of results, rather than a single result. For example, the function TRANSPOSE can switch the rows and columns of an array, and return a new array. The function MMULT can multiply two arrays and return a new array.
To enter an array formula in Excel, you need to use a special syntax. You need to enclose the formula in curly braces { } and press Ctrl+Shift+Enter. This tells Excel that the formula is an array formula, and that it should evaluate it as such. For example, the formula {=SUM (A1:B5)} is an array formula that adds up all the values in the range A1:B5 and returns a single result. The formula {=TRANSPOSE (A1:B5)} is an array formula that switches the rows and columns of the range A1:B5 and returns a new array of 5 columns and 2 rows.
How to insert array formulae using openpyxl
Openpyxl is a Python library that allows you to read and write Excel files. It has a module called worksheet.formula that provides classes and functions for working with formulae, including array formulae.
To insert an array formula using openpyxl, you need to use the class ArrayFormula. This class takes two arguments: the reference of the cell or range where the formula should be entered, and the formula itself. For example, the following code creates an array formula that adds up all the values in the range A1:B5 and enters it in the cell C1:
from openpyxl import Workbook
from openpyxl.worksheet.formula import ArrayFormula
wb = Workbook()
ws = wb.active
ws["C1"] = ArrayFormula("C1", "=SUM (A1:B5)")
To insert an array formula that returns an array of results, you need to specify the reference of the entire range where the results should be displayed. For example, the following code creates an array formula that switches the rows and columns of the range A1:B5 and enters it in the range C1:G2:
from openpyxl import Workbook
from openpyxl.worksheet.formula import ArrayFormula
wb = Workbook()
ws = wb.active
ws["C1:G2"] = ArrayFormula("C1:G2", "=TRANSPOSE (A1:B5)")
Note that the reference of the cell or range where the formula should be entered must match the reference of the ArrayFormula object. For example, the following code will raise an error, because the reference of the cell C1 does not match the reference of the ArrayFormula object C1:G2:
from openpyxl import Workbook
from openpyxl.worksheet.formula import ArrayFormula
wb = Workbook()
ws = wb.active
ws["C1"] = ArrayFormula("C1:G2", "=TRANSPOSE (A1:B5)")
Practical scenario: using array formulae to calculate sales commission
To illustrate the use of array formulae in a practical scenario, let us consider the following example. Suppose we have a table of sales data for a company, where each row represents a salesperson, and each column represents a month. The table also shows the sales target and the commission rate for each salesperson. The table is stored in an Excel file called sales.xlsx, and looks like this:
Name | Target | Rate | Jan | Feb | Mar | Apr | May | Jun |
---|---|---|---|---|---|---|---|---|
Alice | 10000 | 10% | 8000 | 9000 | 11000 | 12000 | 10000 | 9500 |
Bob | 15000 | 15% | 10000 | 12000 | 13000 | 14000 | 16000 | 17000 |
Charlie | 20000 | 20% | 15000 | 18000 | 19000 | 21000 | 22000 | 23000 |
We want to calculate the sales commission for each salesperson and each month, based on the following rules:
- If the sales amount is less than the target, the commission is zero.
- If the sales amount is equal to or greater than the target, the commission is the sales amount multiplied by the commission rate.
We also want to calculate the total commission for each salesperson and each month, and the grand total commission for the entire table.