The meshgrid operation creates multi-dimensional coordinate arrays:
import numpy as np
x = np.arange(2)
y = np.arange(3)
ii, jj = np.meshgrid(x, y, indexing='ij')
ii
# Expected result
# array([[0, 0, 0],
# [1, 1, 1]])
np.meshgrid()
returns a tuple with the inputs broadcast across rows and columns (and higher dimensions if you pass more vectors in). Here’s what the jj
array looks like:
jj
# Expected result
# array([[0, 1, 2],
# [0, 1, 2]])
I’ll talk about the indexing='ij'
argument below. But first, let’s take a look at an example.
A realistic use case
Ok, let’s look at an example. Say we have a simple linear regression model y = w0 + w1 * x
where w0
is the intercept and w1
is the slope. If x
is a 1-dimensional NumPy array of inputs and y
is a 1-dimensional NumPy array of targets, we can calculate the mean-squared error (MSE) loss for a given w0
and w1
with the following:
yhat = w0 + w1 * x
loss = np.square(yhat - y).mean()
Notice, if our data is fixed, then the loss becomes a function of w0
and w1
. That means we can plot the loss surface to see how it varies with w0
and w1
. This is a perfect use case for meshgrid()
.
First, let’s fix w0=3
and w1=2.5
and then generate 50 data points using this model plus some random noise:
x = np.random.normal(0, 1, size=50)
y = 3 + 2.5 * x + np.random.normal(0, 1, size=50)
A scatter plot of these points points will look something like this. You can see visually that the y-intercept is roughly 3 and the slope is roughly 2.5.
Now we want to create all possible values of w0
and w1
in the range [-5, 5]
so we can compute the loss value for those pairs:
w0, w1 = np.meshgrid(
np.linspace(-5, 5),
np.linspace(-5, 5),
indexing='ij',
)
w0.shape
# Expected result
# (50, 50)
At this point w0
and w1
both have shape (50, 50)
. That’s because np.linspace()
creates a 50 evenly spaced values by default. If we flatten w0
and w1
and line up the elements, we get 2500 evenly spaced points on a grid.
Let’s take a look at the first 10 points:
list(zip(w0.ravel(), w1.ravel()))[:10]
# Expected result
# [(-5.0, -5.0),
# (-5.0, -4.795918367346939),
# (-5.0, -4.591836734693878),
# (-5.0, -4.387755102040816),
# (-5.0, -4.183673469387755),
# (-5.0, -3.979591836734694),
# (-5.0, -3.7755102040816326),
# (-5.0, -3.571428571428571),
# (-5.0, -3.36734693877551),
# (-5.0, -3.163265306122449)]
Now, we can compute the model for each (w0, w1)
pair for all 50 random data points. That allows us to compute MSE for each of the weight values:
yhat = w0[:, None] + w1[:, None] * x
loss = np.square(y - yhat).mean(-1)
Here we expand the w0
and w1
arrays so that we can use NumPy broadcasting to compute yhat
for all points at the same time. The result yhat
has shape (2500, 50)
: it’s the predicted value for all pairs of (w0, w1)
for all 50 points. After that, we use the mean function to reduce the column dimension and compute MSE for each point on the grid.
Now, we can reshape loss to be (50, 50)
and plot loss for each pair of weights:
loss = loss.reshape(w0.shape)
plt.contourf(w0, w1, loss, 30)
plt.plot(3, 2.5, 'x', c='white', ms=7)
plt.xlabel("w0")
plt.ylabel("w1")
plt.title("Loss surface");
That’s a nice looking loss surface! I’ve added a mark at the true values for w0
and w1
. The loss surface will never have its optimum at exactly the true values since it’s learning the function on a random sample of all data, but this is pretty close!
Indexing
Ok let’s revisit indexing. The indexing='ij'
argument tells NumPy to use matrix indexing instead of Cartesian indexing. For 2-dimensional meshgrids, matrix indexing produces coordinate arrays that are the transpose of Cartesian coordinate arrays. The question is whether those coordinate arrays should be interpreted as the indices of a matrix (matrix indexing) or points in Euclidean space (Cartesian indexing).
Here’s a diagram that displays the index pairs for xy
versus ij
indexing:
Notice for Cartesian indexing, the x-axis increases as your move “to the right” and the y-axis increases as you move “down”.
Other tensor libraries
In NumPy, you actually don’t need meshgrid() but it’s good to know it because the same function exists in PyTorch and TensorFlow. Although the API is more limited in both cases (and the default indexing for PyTorch is different) the basic usage is roughly same:
import torch
x = torch.arange(2)
y = torch.arange(2)
xx, yy = torch.meshgrid(x, y)
xx
# Expected result
# tensor([[0, 0],
# [1, 1]])
And:
import tensorflow as tf
x = tf.range(2)
y = tf.range(2)
xx, yy = tf.meshgrid(x, y, indexing='ij')
xx
# Expected result
# <tf.Tensor: shape=(2, 2), dtype=int32, numpy=
# array([[0, 0],
# [1, 1]], dtype=int32)>
One thing to watch out for: in NumPy and TensorFlow, the default indexing is Cartesian, whereas in PyTorch, the default indexing is matrix. If you want to avoid confusion, you can plan to set indexing='ij'
whenever you call meshgrid()
in NumPy or TensorFlow. You probably don’t want to force yourself to remember which indexing gets returned by default.