Iterating Over Rows in Pandas

Ben Cook • Posted 2021-01-29 • Last updated 2021-10-15

Generally, you should avoid iterating over rows in a DataFrame. Python for loops are slow and Pandas is not designed for this type of access pattern. Instead, you should be applying vectorized operations. But look, sometimes, for whatever reason, you’re going to want to do it anyway.

If you absolutely must iterate over the rows in your DataFrame, use the .itertuples() method:

import pandas as pd

df = pd.read_csv("https://jbencook.s3.amazonaws.com/data/dummy-sales.csv").head()

for row in df.itertuples():
    print(row.Index, row.date)

# Expected result
# 0 1999-01-02
# 1 1999-01-03
# 2 1999-01-04
# 3 1999-01-06
# 4 1999-01-07

This is better than .iterrows() because:

  1. .itertuples() preserves the data type of the column values.
  2. It’s faster.
  3. You get namedtuples, which lets you access the column values as attributes, e.g. row.date. Pandas also includes the index as row.Index for good measure.
  4. The namedtuples are immutable. This is good because weird things happen if you try to modify rows generated by .iterrows(), another popular method for iterating through Pandas rows.

One additional thing to point out is that if your columns have names that don’t allow them to be accessed as attributes, such as "bad column name", you can still use .itertuples(), but it becomes a little uglier. In this case, pass index=False and then you can access by index or with df.columns.get_loc("bad column name"). Here’s an example with both approaches on our dummy dataset:

for row in df.itertuples(index=False):
    print(row[0], row[df.columns.get_loc("region")])

# Expected result
# 1999-01-02 APAC
# 1999-01-03 AMER
# 1999-01-04 EMEA
# 1999-01-06 APAC
# 1999-01-07 APAC

That’s not a huge deal, but it’s probably better to rename your column instead.