Editing Excel Data and Creating Pivot Tables with Python and xlwings

A pivot table is a powerful tool in Excel that allows you to summarize and analyze data from a large data set. A pivot table can be linked to the source data, so that any changes in the source data are reflected in the pivot table. However, if you want to edit the source data using Python, you may encounter some problems. For example, if you use openpyxl to write a pandas dataframe into the source data sheet, this may break the pivot table linking and cause errors or incorrect results.

To avoid this, you need to use a Python library that preserves pivot table linking after editing the sheets. One such library is xlwings, which is a high-level interface between Python and Excel. xlwings allows you to read and write data to Excel without losing the pivot table functionality. You can also use xlwings to create and manipulate pivot tables directly from Python.

To use xlwings, you need to install it using pip or conda, and enable the xlwings add-in in Excel. Then, you can import xlwings in your Python code and connect to an existing workbook or create a new one. You can access the worksheets and ranges using the wb.sheets and sht.range methods, where wb is the workbook object and sht is the sheet object. You can read and write data to the ranges using the value attribute, which accepts and returns pandas dataframes or numpy arrays. You can also use the options method to specify how the data should be formatted in Excel.

To create a pivot table using xlwings, you need to use the sht.pivots.add method, which takes the following arguments:

  • source: the range of the source data
  • target: the range of the top-left cell of the pivot table
  • fields: a dictionary that maps the field names to the pivot area (row, column, page, or data)
  • sortby: the field name to sort the pivot table by
  • aggregate: the aggregation function to use for the data fields (sum, count, average, etc.)
  • report_layout: the layout of the pivot table (outline, tabular, or compact)
  • subtotal: whether to show subtotals for the row and column fields
  • grand_total: whether to show grand totals for the row and column fields

For example, the following code creates a pivot table from a dataframe called df, which has four columns: Region, Product, Sales, and Quantity.

Python

import xlwings as xw
import pandas as pd

# Create a sample dataframe
df = pd.DataFrame({
    "Region": ["North", "South", "East", "West"] * 3,
    "Product": ["A", "B", "C"] * 4,
    "Sales": [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200],
    "Quantity": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120]
})

# Connect to a new workbook
wb = xw.Book()

# Write the dataframe to the first sheet
sht = wb.sheets[0]
sht.range("A1").options(index=False).value = df

# Create a pivot table in the second sheet
sht2 = wb.sheets.add()
sht2.pivots.add(
    source=sht.range("A1").expand(),
    target=sht2.range("A1"),
    fields={
        "Region": "row",
        "Product": "column",
        "Sales": "data",
        "Quantity": "data"
    },
    sortby="Region",
    aggregate="sum",
    report_layout="outline",
    subtotal=True,
    grand_total=True
)

The result is a pivot table like this:

Table

Region Product Grand Total
A B C
North 100 500 900 1500
South 200 600 1000 1800
East 300 700 1100 2100
West 400 800 1200 2400
Grand Total 1000 2600 4200 7800
Table

Region Product Grand Total
A B C
North 10 50 90 150
South 20 60 100 180
East 30 70 110 210
West 40 80 120 240
Grand Total 100 260 420 780

You can also modify or delete an existing pivot table using the sht.pivots method, which returns a collection of pivot tables in the sheet. You can access a specific pivot table by its name or index, and use the methods and attributes of the pivot table object to change its properties. For example, the following code changes the report layout and the aggregation function of the first pivot table in the sheet:

Python

# Get the first pivot table in the sheet
pt = sht2.pivots[0]

# Change the report layout to tabular
pt.report_layout = "tabular"

# Change the aggregation function to average
pt.data_fields[0].aggregate = "average"
pt.data_fields[1].aggregate = "average"

The result is a pivot table like this:

Table

Region Product Sales Quantity
North A 100 10
B 500 50
C 900 90
Grand Total 500 50
South A 200 20
B 600 60
C 1000 100
Grand Total 600 60
East A 300 30
B 700 70
C 1100 110
Grand Total 700 70
West A 400 40
B 800 80
C 1200 120
Grand Total 800 80
Grand Total A 250 25
B 650 65
C 1050 105
Grand Total 650 65

To delete a pivot table, you can use the pt.delete() method.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *