How to Use the Pandas Melt Command

The Pandas melt command unpivots tabular data, transforming it from wide to long format. To understand this operation, you have to know how pivots work for tabular data.

But don’t worry! In this post, I will explain it to you. If you already understand this transformation well, you should skip to the usage section. But if you’re struggling to understand what the melt command actually does, read on.

How pivots and unpivots work

A pivot takes a list of records and aggregates them across two or more variables in order to show relationships between them.

Long format

Let’s say we have a list of sales transactions for our company in long format.

date	region	revenue
1999-01-02	APAC	354
1999-01-02	EMEA	236
1999-01-03	APAC	700
1999-01-03	APAC	472
1999-01-04	APAC	642
1999-01-04	APAC	645
1999-01-05	APAC	255
1999-01-06	EMEA	711
1999-01-07	APAC	967
1999-01-07	APAC	280

Long format is common for storing data in databases because it’s efficient, but it’s not usually how humans want to consume the information.

Wide format

To get the table into a wide format view of total daily sales per region, we can perform a pivot.

date	APAC	EMEA
1999-01-02	354	236
1999-01-03	1172	0
1999-01-04	1287	0
1999-01-05	255	0
1999-01-06	0	711
1999-01-07	1247	0

This pivot sums the revenue from all transactions in each region for each day. You could also perform any other aggregate function like mean, min, max, etc.

Unpivoting

The melt command is all about converting a table from wide format (as above) into long format. This is called “unpivoting”. But watch out! Pivots are not invertible. Unpivoting is a valid operation on wide format tables, but we do not actually recover the original table in long format. This is because we actually threw some information away when we aggregated the transactions:

date	region	total_revenue
1999-01-02	APAC	354
1999-01-02	EMEA	236
1999-01-03	APAC	1172
1999-01-03	EMEA	0
1999-01-04	APAC	1287
1999-01-04	EMEA	0
1999-01-05	APAC	255
1999-01-05	EMEA	0
1999-01-06	APAC	0
1999-01-06	EMEA	711
1999-01-07	APAC	1247
1999-01-07	EMEA	0

Notice there are new records with 0 revenue and transactions that occured on the same day in the same region are summed. To make this clear, I recommend renaming the value column. In the new long table above I call the column total_revenue instead of revenue.

Usage

The melt() command comes in two forms. If you have a DataFrame called wide_df in wide format, you can unpivot it with:

A top-level Pandas function: pd.melt(wide_df)
A method on the DataFrame: wide_df.melt()

These do exactly the same thing.

For the most basic use case, you’ll probably want to pass in a few arguments:

id_vars should be the row identifier from your wide format table.
value_vars should be the list of columns you want to collapse.
var_name should be the new name of the value column.

pd.melt(
    wide_df,
    id_vars='date',
    value_vars=['APAC', 'EMEA'],
    value_name='total_revenue',
)

Or equivalently:

wide_df.melt(
    id_vars='date',
    value_vars=['APAC', 'EMEA'],
    value_name='total_revenue',
)

Note: if the row identifier you want to use is a Pandas index, you can use the reset_index() method to create a column with the same data:

wide_df = wide_df.reset_index()

Try it yourself!

There’s no better way to grok a new function than playing around with it yourself. Check out my Jupyter notebook on GitHub that creates a long format dummy dataset and pivots/unpivots it. You should experiment with the input arguments on both the pivot_table() and melt() methods. Enjoy!