A norm is a measure of the size of a matrix or vector and you can compute it in NumPy with the
import numpy as np x = np.eye(4) np.linalg.norm(x) # Expected result # 2.0
np.linalg.norm() is called on an array-like input without any additional arguments, the default behavior is to compute the L2 norm on a flattened view of the array. This is the square root of the sum of squared elements and can be interpreted as the length of the vector in Euclidean space.
ravel() method flattens an array without making any copies and
ord specifies the type of norm that will be computed, the above usage is equivalent to:
np.linalg.norm(x.ravel(), ord=2) # Expected result # 2.0
But watch out! The function can calculate many different kinds of norms. And if you specify the
ord argument, then matrices (arrays with
ndim=2) are treated differently than vectors (arrays with
ndim=1). This leads to a somewhat surprising result:
np.linalg.norm(x, ord=2) # Expected result # 1.0
That is, even though
ord=2 is the default behavior for vectors (and for vectors
ord=2 does mean L2 norm),
np.linalg.norm(x, ord=2) does not compute the L2 norm if x has more than 1 dimension. In fact, somewhat stupidly,
ord=2 actually means something different for matrices in
In order to avoid getting tricked by this behavior, it’s worth taking a look at the API and some example use cases.
np.linalg.norm() function has three important arguments:
x: this is an array-like input. If
np.linalg.norm()will return the L2 norm of
x.ravel(), which is a flattened (i.e. 1-dimensional) view of the array.
ord: the type of norm. If you just pass in
xmust be 1-dimensional or 2-dimensional, otherwise you will get an exception. Most commonly, when
xis a vector, you will want
ord=1for L2 and L1 norms respectively. And when
xis a matrix, you will want
ord='fro'for the Frobenius norm. But NumPy does support other norms which you can look up in their docs.
axis: the axis (or axes) to reduce with the norm operation. If this is an
intthen you will get vector norms along that dimension and if this is a 2-tuple, then you will get matrix norms along those dimensions.
That’s all a little too confusing for my preference. So instead of worrying about the combination of the number of dimensions of your
x argument and
ord, my recommendation is to use
x by itself when you want an L2 norm or Frobenius norm (which is the same as the L2 norm on the flattened matrix):
axis arguments are unavoidable (and I’ll show an example below), but only if 1) you need to reduce one or two of the dimensions or 2) you want to compute a norm other than L2.
Let’s start with an easy example. A great use case for norms is computing the relative error between two arrays. For scalars, relative error is usually calculated with
|x - x'| / |x|. Think of this like the size of the difference divided by the size of the original number.
Since norms are a way to encode the size of an array with a single number, you can use norms to do something very similar for arrays:
x_prime = x + np.random.uniform(0, 0.1) np.linalg.norm(x_prime - x) / np.linalg.norm(x) # Expected result like... # 0.05465174120478311
You can normalize an array in order to force it to have a norm that you specify. For example, you can generate a random array that has an L2 norm of (approximately) 3. Just multiply every element by 3 and divide by the L2 norm:
x = np.random.uniform(size=10) x = 3 * x / np.linalg.norm(x) np.linalg.norm(x) # Expected result 2.9999999999999996
If you wanted the vector have a unit norm, you would simply divide every element by the norm.
Sometimes, you may want to do this for your dataset. Say you have a matrix of data where every row is a sample and every column is a feature. If you want every row to have a unit norm, you can:
- Compute the row-wise norms (reducing the column dimension)
- Divide every element by its row norm
Here’s the code to normalize rows by their L2 norms for a randomly generated dataset with 10 rows and 3 columns:
data = np.random.uniform(size=(10, 3)) row_l2_norms = np.linalg.norm(data, axis=1) data /= row_l2_norms[:, None] # Now the rows all have a L2 norm of 1 np.linalg.norm(data, axis=1) # Expected result array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
row_l2_norms will be a vector with size 10. If we want to use broadcasting rules to divide every element in
data, which has shape
(10, 3), we need to add a dummy dimension to give
(10, 1). That’s what the
None index is doing.
Additionally, since the input is a matrix and we’re passing in
axis=1, the function will compute the vector norm of each row. This means it’s safe to pass in
ord=1 to get the row-wise L1 norms:
data = np.random.uniform(size=(10, 3)) row_l1_norms = np.linalg.norm(data, ord=1, axis=1) data /= row_l1_norms[:, None] # Now the rows all have a L1 norm of 1 np.linalg.norm(data, ord=1, axis=1) # Expected result array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
This would not work if
data had more than 2 dimensions.
By the way, scikit-learn provides a convenience function so you can more easily normalize rows of a dataset to have L1 or L2 unit norms. Here’s an example of normalizing every row by its L1 norm:
from sklearn import preprocessing data = np.random.uniform(size=(10, 3)) data = preprocessing.normalize(data, norm='l1') # Now the rows all have a L1 norm of 1 np.linalg.norm(data, ord=1, axis=1) # Expected result array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
You can also use
np.linalg.norm() to compute pairwise Euclidean distance between two sets of points. This is a little more involved and I have a separate post about computing pairwise distance.