A norm is a measure of the size of a matrix or vector and you can compute it in NumPy with the `np.linalg.norm()`

function:

```
import numpy as np
x = np.eye(4)
np.linalg.norm(x)
# Expected result
# 2.0
```

When `np.linalg.norm()`

is called on an array-like input without any additional arguments, the default behavior is to compute the L2 norm on a flattened view of the array. This is the square root of the sum of squared elements and can be interpreted as the length of the vector in Euclidean space.

Since the `ravel()`

method flattens an array without making any copies and `ord`

specifies the type of norm that will be computed, the above usage is equivalent to:

```
np.linalg.norm(x.ravel(), ord=2)
# Expected result
# 2.0
```

*But watch out!* The function can calculate many different kinds of norms. And if you specify the `ord`

argument, then matrices (arrays with `ndim=2`

) are treated differently than vectors (arrays with `ndim=1`

). This leads to a somewhat surprising result:

```
np.linalg.norm(x, ord=2)
# Expected result
# 1.0
```

That is, even though `ord=2`

is the default behavior for vectors (and for vectors `ord=2`

*does* mean L2 norm), `np.linalg.norm(x, ord=2)`

does *not* compute the L2 norm if x has more than 1 dimension. In fact, somewhat stupidly, `ord=2`

actually means something different for matrices in `np.linalg.norm()`

.

In order to avoid getting tricked by this behavior, it’s worth taking a look at the API and some example use cases.

### API

The `np.linalg.norm()`

function has three important arguments: `x`

, `ord`

, and `axis`

.

`x`

: this is an array-like input. If`ord`

and`axis`

are both`None`

, then`np.linalg.norm()`

will return the L2 norm of`x.ravel()`

, which is a flattened (i.e. 1-dimensional) view of the array.`ord`

: the type of norm. If you just pass in`x`

and`ord`

leaving`axis`

as`None`

, then`x`

must be 1-dimensional or 2-dimensional, otherwise you will get an exception. Most commonly, when`x`

is a vector, you will want`ord=2`

or`ord=1`

for L2 and L1 norms respectively. And when`x`

is a matrix, you will want`ord='fro'`

for the Frobenius norm. But NumPy does support other norms which you can look up in their docs.`axis`

: the axis (or axes) to reduce with the norm operation. If this is an`int`

then you will get vector norms along that dimension and if this is a 2-tuple, then you will get matrix norms along those dimensions.

That’s all a little too confusing for my preference. So instead of worrying about the combination of the number of dimensions of your `x`

argument and `ord`

, my recommendation is to use `x`

by itself when you want an L2 norm or Frobenius norm (which is the same as the L2 norm on the flattened matrix):

`np.linalg.norm(x)`

Sometimes the `ord`

and `axis`

arguments are unavoidable (and I’ll show an example below), but only if 1) you need to reduce one or two of the dimensions or 2) you want to compute a norm other than L2.

### Examples

#### Relative error

Let’s start with an easy example. A great use case for norms is computing the relative error between two arrays. For scalars, relative error is usually calculated with `|x - x'| / |x|`

. Think of this like the size of the difference divided by the size of the original number.

Since norms are a way to encode the size of an array with a single number, you can use norms to do something very similar for arrays:

```
x_prime = x + np.random.uniform(0, 0.1)
np.linalg.norm(x_prime - x) / np.linalg.norm(x)
# Expected result like...
# 0.05465174120478311
```

Easy!

#### Normalization

You can normalize an array in order to force it to have a norm that you specify. For example, you can generate a random array that has an L2 norm of (approximately) 3. Just multiply every element by 3 and divide by the L2 norm:

```
x = np.random.uniform(size=10)
x = 3 * x / np.linalg.norm(x)
np.linalg.norm(x)
# Expected result
2.9999999999999996
```

If you wanted the vector have a unit norm, you would simply divide every element by the norm.

Sometimes, you may want to do this for your dataset. Say you have a matrix of data where every row is a sample and every column is a feature. If you want every row to have a unit norm, you can:

- Compute the row-wise norms (reducing the column dimension)
- Divide every element by its row norm

Here’s the code to normalize rows by their L2 norms for a randomly generated dataset with 10 rows and 3 columns:

```
data = np.random.uniform(size=(10, 3))
row_l2_norms = np.linalg.norm(data, axis=1)
data /= row_l2_norms[:, None]
# Now the rows all have a L2 norm of 1
np.linalg.norm(data, axis=1)
# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
```

Notice: `row_l2_norms`

will be a vector with size 10. If we want to use broadcasting rules to divide every element in `data`

, which has shape `(10, 3)`

, we need to add a dummy dimension to give `row_l2_norms`

shape `(10, 1)`

. That’s what the `None`

index is doing.

Additionally, since the input is a matrix and we’re passing in `axis=1`

, the function will compute the vector norm of each row. This means it’s safe to pass in `ord=1`

to get the row-wise L1 norms:

```
data = np.random.uniform(size=(10, 3))
row_l1_norms = np.linalg.norm(data, ord=1, axis=1)
data /= row_l1_norms[:, None]
# Now the rows all have a L1 norm of 1
np.linalg.norm(data, ord=1, axis=1)
# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
```

This would *not* work if `data`

had more than 2 dimensions.

By the way, scikit-learn provides a convenience function so you can more easily normalize rows of a dataset to have L1 or L2 unit norms. Here’s an example of normalizing every row by its L1 norm:

```
from sklearn import preprocessing
data = np.random.uniform(size=(10, 3))
data = preprocessing.normalize(data, norm='l1')
# Now the rows all have a L1 norm of 1
np.linalg.norm(data, ord=1, axis=1)
# Expected result
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
```

#### Pairwise distance

You can also use `np.linalg.norm()`

to compute pairwise Euclidean distance between two sets of points. This is a little more involved and I have a separate post about computing pairwise distance.