Editing Excel Data and Creating Pivot Tables with Python and xlwings

A pivot table is a powerful tool in Excel that allows you to summarize and analyze data from a large data set. A pivot table can be linked to the source data, so that any changes in the source data are reflected in the pivot table. However, if you want to edit the source data using Python, you may encounter some problems. For example, if you use openpyxl to write a pandas dataframe into the source data sheet, this may break the pivot table linking and cause errors or incorrect results.

To avoid this, you need to use a Python library that preserves pivot table linking after editing the sheets. One such library is xlwings, which is a high-level interface between Python and Excel. xlwings allows you to read and write data to Excel without losing the pivot table functionality. You can also use xlwings to create and manipulate pivot tables directly from Python.

To use xlwings, you need to install it using pip or conda, and enable the xlwings add-in in Excel. Then, you can import xlwings in your Python code and connect to an existing workbook or create a new one. You can access the worksheets and ranges using the wb.sheets and sht.range methods, where wb is the workbook object and sht is the sheet object. You can read and write data to the ranges using the value attribute, which accepts and returns pandas dataframes or numpy arrays. You can also use the options method to specify how the data should be formatted in Excel.

To create a pivot table using xlwings, you need to use the sht.pivots.add method, which takes the following arguments:

source: the range of the source data
target: the range of the top-left cell of the pivot table
fields: a dictionary that maps the field names to the pivot area (row, column, page, or data)
sortby: the field name to sort the pivot table by
aggregate: the aggregation function to use for the data fields (sum, count, average, etc.)
report_layout: the layout of the pivot table (outline, tabular, or compact)
subtotal: whether to show subtotals for the row and column fields
grand_total: whether to show grand totals for the row and column fields

For example, the following code creates a pivot table from a dataframe called df, which has four columns: Region, Product, Sales, and Quantity.

Python

import xlwings as xw
import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({
    "Region": ["North", "South", "East", "West"] * 3,
    "Product": ["A", "B", "C"] * 4,
    "Sales": [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200],
    "Quantity": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
})

# Connect to a new workbook
wb = xw.Book()

# Write the dataframe to the first sheet
sht = wb.sheets[0]
sht.range("A1").options(index=False).value = df

# Create a pivot table in the second sheet
sht2 = wb.sheets.add()
sht2.pivots.add(
    source=sht.range("A1").expand(),
    target=sht2.range("A1"),
    fields={
        "Region": "row",
        "Product": "column",
        "Sales": "data",
        "Quantity": "data"
    },
    sortby="Region",
    aggregate="sum",
    report_layout="outline",
    subtotal=True,
    grand_total=True
)

The result is a pivot table like this:

Table

Region	Product			Grand Total
	A	B	C
North	100	500	900	1500
South	200	600	1000	1800
East	300	700	1100	2100
West	400	800	1200	2400
Grand Total	1000	2600	4200	7800

Table

Region	Product			Grand Total
	A	B	C
North	10	50	90	150
South	20	60	100	180
East	30	70	110	210
West	40	80	120	240
Grand Total	100	260	420	780

You can also modify or delete an existing pivot table using the sht.pivots method, which returns a collection of pivot tables in the sheet. You can access a specific pivot table by its name or index, and use the methods and attributes of the pivot table object to change its properties. For example, the following code changes the report layout and the aggregation function of the first pivot table in the sheet:

Python

# Get the first pivot table in the sheet
pt = sht2.pivots[0]

# Change the report layout to tabular
pt.report_layout = "tabular"

# Change the aggregation function to average
pt.data_fields[0].aggregate = "average"
pt.data_fields[1].aggregate = "average"

The result is a pivot table like this:

Table

Region	Product	Sales	Quantity
North	A	100	10
	B	500	50
	C	900	90
	Grand Total	500	50
South	A	200	20
	B	600	60
	C	1000	100
	Grand Total	600	60
East	A	300	30
	B	700	70
	C	1100	110
	Grand Total	700	70
West	A	400	40
	B	800	80
	C	1200	120
	Grand Total	800	80
Grand Total	A	250	25
	B	650	65
	C	1050	105
	Grand Total	650	65

To delete a pivot table, you can use the pt.delete() method.

Editing Excel Data and Creating Pivot Tables with Python and xlwings

Comments

Leave a Reply Cancel reply