A pivot table is a powerful tool in Excel that allows you to summarize and analyze data from a large data set. A pivot table can be linked to the source data, so that any changes in the source data are reflected in the pivot table. However, if you want to edit the source data using Python, you may encounter some problems. For example, if you use openpyxl to write a pandas dataframe into the source data sheet, this may break the pivot table linking and cause errors or incorrect results.
To avoid this, you need to use a Python library that preserves pivot table linking after editing the sheets. One such library is xlwings, which is a high-level interface between Python and Excel. xlwings allows you to read and write data to Excel without losing the pivot table functionality. You can also use xlwings to create and manipulate pivot tables directly from Python.
To use xlwings, you need to install it using pip or conda, and enable the xlwings add-in in Excel. Then, you can import xlwings in your Python code and connect to an existing workbook or create a new one. You can access the worksheets and ranges using the wb.sheets and sht.range methods, where wb is the workbook object and sht is the sheet object. You can read and write data to the ranges using the value attribute, which accepts and returns pandas dataframes or numpy arrays. You can also use the options method to specify how the data should be formatted in Excel.
To create a pivot table using xlwings, you need to use the sht.pivots.add method, which takes the following arguments:
- source: the range of the source data
- target: the range of the top-left cell of the pivot table
- fields: a dictionary that maps the field names to the pivot area (row, column, page, or data)
- sortby: the field name to sort the pivot table by
- aggregate: the aggregation function to use for the data fields (sum, count, average, etc.)
- report_layout: the layout of the pivot table (outline, tabular, or compact)
- subtotal: whether to show subtotals for the row and column fields
- grand_total: whether to show grand totals for the row and column fields
For example, the following code creates a pivot table from a dataframe called df, which has four columns: Region, Product, Sales, and Quantity.
import xlwings as xw
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({
"Region": ["North", "South", "East", "West"] * 3,
"Product": ["A", "B", "C"] * 4,
"Sales": [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200],
"Quantity": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
})
# Connect to a new workbook
wb = xw.Book()
# Write the dataframe to the first sheet
sht = wb.sheets[0]
sht.range("A1").options(index=False).value = df
# Create a pivot table in the second sheet
sht2 = wb.sheets.add()
sht2.pivots.add(
source=sht.range("A1").expand(),
target=sht2.range("A1"),
fields={
"Region": "row",
"Product": "column",
"Sales": "data",
"Quantity": "data"
},
sortby="Region",
aggregate="sum",
report_layout="outline",
subtotal=True,
grand_total=True
)
The result is a pivot table like this:
Region | Product | Grand Total | ||
---|---|---|---|---|
A | B | C | ||
North | 100 | 500 | 900 | 1500 |
South | 200 | 600 | 1000 | 1800 |
East | 300 | 700 | 1100 | 2100 |
West | 400 | 800 | 1200 | 2400 |
Grand Total | 1000 | 2600 | 4200 | 7800 |
Region | Product | Grand Total | ||
---|---|---|---|---|
A | B | C | ||
North | 10 | 50 | 90 | 150 |
South | 20 | 60 | 100 | 180 |
East | 30 | 70 | 110 | 210 |
West | 40 | 80 | 120 | 240 |
Grand Total | 100 | 260 | 420 | 780 |
You can also modify or delete an existing pivot table using the sht.pivots method, which returns a collection of pivot tables in the sheet. You can access a specific pivot table by its name or index, and use the methods and attributes of the pivot table object to change its properties. For example, the following code changes the report layout and the aggregation function of the first pivot table in the sheet:
# Get the first pivot table in the sheet
pt = sht2.pivots[0]
# Change the report layout to tabular
pt.report_layout = "tabular"
# Change the aggregation function to average
pt.data_fields[0].aggregate = "average"
pt.data_fields[1].aggregate = "average"
The result is a pivot table like this:
Region | Product | Sales | Quantity |
---|---|---|---|
North | A | 100 | 10 |
B | 500 | 50 | |
C | 900 | 90 | |
Grand Total | 500 | 50 | |
South | A | 200 | 20 |
B | 600 | 60 | |
C | 1000 | 100 | |
Grand Total | 600 | 60 | |
East | A | 300 | 30 |
B | 700 | 70 | |
C | 1100 | 110 | |
Grand Total | 700 | 70 | |
West | A | 400 | 40 |
B | 800 | 80 | |
C | 1200 | 120 | |
Grand Total | 800 | 80 | |
Grand Total | A | 250 | 25 |
B | 650 | 65 | |
C | 1050 | 105 | |
Grand Total | 650 | 65 |
To delete a pivot table, you can use the pt.delete() method.