The Pandas melt command unpivots tabular data, transforming it from wide to long format. To understand this operation, you have to know how pivots work for tabular data.
But don’t worry! In this post, I will explain it to you. If you already understand this transformation well, you should skip to the usage section. But if you’re struggling to understand what the melt command actually does, read on.
How pivots and unpivots work
A pivot takes a list of records and aggregates them across two or more variables in order to show relationships between them.
Let’s say we have a list of sales transactions for our company in long format.
Long format is common for storing data in databases because it’s efficient, but it’s not usually how humans want to consume the information.
To get the table into a wide format view of total daily sales per region, we can perform a pivot.
This pivot sums the revenue from all transactions in each region for each day. You could also perform any other aggregate function like mean, min, max, etc.
The melt command is all about converting a table from wide format (as above) into long format. This is called “unpivoting”. But watch out! Pivots are not invertible. Unpivoting is a valid operation on wide format tables, but we do not actually recover the original table in long format. This is because we actually threw some information away when we aggregated the transactions:
Notice there are new records with 0 revenue and transactions that occured on the same day in the same region are summed. To make this clear, I recommend renaming the value column. In the new long table above I call the column
total_revenue instead of
melt() command comes in two forms. If you have a DataFrame called
wide_df in wide format, you can unpivot it with:
- A top-level Pandas function:
- A method on the DataFrame:
These do exactly the same thing.
For the most basic use case, you’ll probably want to pass in a few arguments:
id_varsshould be the row identifier from your wide format table.
value_varsshould be the list of columns you want to collapse.
var_nameshould be the new name of the value column.
pd.melt( wide_df, id_vars='date', value_vars=['APAC', 'EMEA'], value_name='total_revenue', )
wide_df.melt( id_vars='date', value_vars=['APAC', 'EMEA'], value_name='total_revenue', )
Note: if the row identifier you want to use is a Pandas index, you can use the
reset_index() method to create a column with the same data:
wide_df = wide_df.reset_index()
Try it yourself!
There’s no better way to grok a new function than playing around with it yourself. Check out my Jupyter notebook on GitHub that creates a long format dummy dataset and pivots/unpivots it. You should experiment with the input arguments on both the
melt() methods. Enjoy!