NumPy Norm: Understanding np.linalg.norm()

A norm is a measure of the size of a matrix or vector and you can compute it in NumPy with the np.linalg.norm() function:

import numpy as np

x = np.eye(4)
np.linalg.norm(x)

# Expected result
# 2.0

When np.linalg.norm() is called on an array-like input without any additional arguments, the default behavior is to compute the L2 norm on a flattened view of the array. This is the square root of the sum of squared elements and can be interpreted as the length of the vector in Euclidean space.

Since the ravel() method flattens an array without making any copies and ord specifies the type of norm that will be computed, the above usage is equivalent to:

np.linalg.norm(x.ravel(), ord=2)

# Expected result
# 2.0

But watch out! The function can calculate many different kinds of norms. And if you specify the ord argument, then matrices (arrays with ndim=2) are treated differently than vectors (arrays with ndim=1). This leads to a somewhat surprising result:

np.linalg.norm(x, ord=2)

# Expected result
# 1.0

That is, even though ord=2 is the default behavior for vectors (and for vectors ord=2 does mean L2 norm), np.linalg.norm(x, ord=2) does not compute the L2 norm if x has more than 1 dimension. In fact, somewhat stupidly, ord=2 actually means something different for matrices in np.linalg.norm().

In order to avoid getting tricked by this behavior, it’s worth taking a look at the API and some example use cases.

API

The np.linalg.norm() function has three important arguments: x, ord, and axis.

x: this is an array-like input. If ord and axis are both None, then np.linalg.norm() will return the L2 norm of x.ravel(), which is a flattened (i.e. 1-dimensional) view of the array.
ord: the type of norm. If you just pass in x and ord leaving axis as None, then x must be 1-dimensional or 2-dimensional, otherwise you will get an exception. Most commonly, when x is a vector, you will want ord=2 or ord=1 for L2 and L1 norms respectively. And when x is a matrix, you will want ord='fro' for the Frobenius norm. But NumPy does support other norms which you can look up in their docs.
axis: the axis (or axes) to reduce with the norm operation. If this is an int then you will get vector norms along that dimension and if this is a 2-tuple, then you will get matrix norms along those dimensions.

That’s all a little too confusing for my preference. So instead of worrying about the combination of the number of dimensions of your x argument and ord, my recommendation is to use x by itself when you want an L2 norm or Frobenius norm (which is the same as the L2 norm on the flattened matrix):

np.linalg.norm(x)

Sometimes the ord and axis arguments are unavoidable (and I’ll show an example below), but only if 1) you need to reduce one or two of the dimensions or 2) you want to compute a norm other than L2.

Examples

Relative error

Let’s start with an easy example. A great use case for norms is computing the relative error between two arrays. For scalars, relative error is usually calculated with |x - x'| / |x|. Think of this like the size of the difference divided by the size of the original number.

Since norms are a way to encode the size of an array with a single number, you can use norms to do something very similar for arrays:

x_prime = x + np.random.uniform(0, 0.1)
np.linalg.norm(x_prime - x) / np.linalg.norm(x)

# Expected result like...
# 0.05465174120478311

Easy!

Normalization

You can normalize an array in order to force it to have a norm that you specify. For example, you can generate a random array that has an L2 norm of (approximately) 3. Just multiply every element by 3 and divide by the L2 norm:

x = np.random.uniform(size=10)
x = 3 * x / np.linalg.norm(x)
np.linalg.norm(x)

# Expected result
2.9999999999999996

If you wanted the vector have a unit norm, you would simply divide every element by the norm.

Sometimes, you may want to do this for your dataset. Say you have a matrix of data where every row is a sample and every column is a feature. If you want every row to have a unit norm, you can:

Compute the row-wise norms (reducing the column dimension)
Divide every element by its row norm

Here’s the code to normalize rows by their L2 norms for a randomly generated dataset with 10 rows and 3 columns:

data = np.random.uniform(size=(10, 3))
row_l2_norms = np.linalg.norm(data, axis=1)
data /= row_l2_norms[:, None]

# Now the rows all have a L2 norm of 1
np.linalg.norm(data, axis=1)

# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Notice: row_l2_norms will be a vector with size 10. If we want to use broadcasting rules to divide every element in data, which has shape (10, 3), we need to add a dummy dimension to give row_l2_norms shape (10, 1). That’s what the None index is doing.

Additionally, since the input is a matrix and we’re passing in axis=1, the function will compute the vector norm of each row. This means it’s safe to pass in ord=1 to get the row-wise L1 norms:

data = np.random.uniform(size=(10, 3))
row_l1_norms = np.linalg.norm(data, ord=1, axis=1)
data /= row_l1_norms[:, None]

# Now the rows all have a L1 norm of 1
np.linalg.norm(data, ord=1, axis=1)

# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

This would not work if data had more than 2 dimensions.

By the way, scikit-learn provides a convenience function so you can more easily normalize rows of a dataset to have L1 or L2 unit norms. Here’s an example of normalizing every row by its L1 norm:

from sklearn import preprocessing

data = np.random.uniform(size=(10, 3))
data = preprocessing.normalize(data, norm='l1')

# Now the rows all have a L1 norm of 1
np.linalg.norm(data, ord=1, axis=1)

# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Pairwise distance

You can also use np.linalg.norm() to compute pairwise Euclidean distance between two sets of points. This is a little more involved and I have a separate post about computing pairwise distance.