Basic Counting in Python - Sparrow Computing

I love fancy machine learning algorithms as much as anyone. But sometimes, you just need to count things. And Python’s built-in data structures make this really easy. Let’s say we have a list of strings:

things = [
    "a",
    "a", "b",
    "a", "b", "c",
    "a", "b", "c", "d",
]

With a list like this, you might care about a few different counts. What’s the count of all items? What’s the count of unique items? How many instances are there of <some value>? How many instances are there of all unique values?

We can answer these questions easily and efficiently with lists, sets and dictionaries. Being very comfortable with these objects is important for writing good Python code. With that said, let’s find all our counts.

Count all values in a list

We’ll start with an easy one:

len(things)

# Expected result
# 10

The len() function works for built-in Python data structures, but it also works with any class that implements the __len__() method. For example, calling len() on a NumPy array returns the size of the first dimension.

Count unique values in a list

How many unique values are there in a list? Answer this question by first creating a unique collection of values (that is, a set). Then call len() on the set:

len(set(things))

# Expected result
# 4

One thing to point out here is that things doesn’t have to be a list of strings for this to work. In Python, you can put any hashable object into a set. By default, this includes simple data types, but you can implement the __eq__() and __hash__() methods that handle object equality and object hashes (respectively) in order to make any object hashable.

Count instances of a specific value

How many instances of "a" are there in the list? You can find out with the .count() method:

things.count("a")

# Expected result
# 4

Convenient!

Count instances of all unique values

OK, but what if we want to count the number of instances of all unique values? If you use Pandas or SQL, you will probably recognize this as a group by operation. Indeed, Python comes with a itertools.groupby() function that does exactly this. But it’s a bit of a pain because you have to sort your list before passing it in. And if you forget to sort your list, you don’t get an error, you just get the wrong result.

Instead, let’s go back to our trusty friend the set. If we loop through all the unique values (the set of values) then we can call the .count() method with each one. That will tell us what we need to know:

for value in set(things):
    print(value, things.count(value))

# Expected result
# a 4
# c 2
# b 3
# d 1

This is easy and efficient.

One other cool trick

One other thing to mention is that if you want to know all of these counts for a list, you should consider creating a dictionary of value counts first. You can use a collections.defaultdict for this, but you can also create it in a one-liner with dictionary comprehension:

counts = {value: things.count(value) for value in things}

counts

# Expected result
# {'a': 4, 'b': 3, 'c': 2, 'd': 1}

Now we have the count of all unique values. But you can also get all the other counts that we discussed above:

# Count all values in the list
sum(counts.values())

# Expected result
# 10

# Count unique values in the list
len(counts.keys())

# Expected result
# 4

# Count instances of a specific value
counts["a"]

# Expected result
# 4