The Dense class from Keras is an implementation of the simplest neural network building block: the fully connected layer. But using it can be a little confusing because the Keras API adds a bunch of configurable functionality. This post will explain the layer to you in two sections (feel free to skip ahead):
Fully connected layers
At its core, a fully connected layer is a dot product between a data matrix and a weights matrix:
y = XW
The data will have an input shape of
(batch_size, n_features) and the weight matrix will have shape
(n_features, n_units). This equation is important, because it shows you that the output shape will be
Additionally, you typically want to add an
n_units bias vector:
y = XW + b
n_units is 1, this simplifies to linear regression and
b becomes the y-intercept of the equation.
So far, this is a linear transformation. But often, you want to pass the output through an activation function to make it non-linear. For hidden layers in a neural network, this helps with stability during training. For the final layer, activation functions are often used to give the output desirable characteristics like being bounded between 0 and 1 (so they can be interpreted as probabilities).
If we let the activation be an arbitrary function,
f, then the full operation can be written:
y = f(XW + b)
b will both be learned in order to best fit the data,
Initializing a dense layer in Keras is easy:
import tensorflow as tf linear = tf.keras.layers.Dense( 2, use_bias=False, kernel_initializer='ones', )
In the layer above, the output dimension is 2 (referred to as
n_units in the previous section), there is not a bias term and the weights (called a kernel in Keras) are initialized to ones. The only required argument is the first positional argument — the number of output units.
Once defined, you can add a dense layer to a model. But you can also apply it as a forward pass to a tensor to try it out:
x = tf.ones((1, 4)) linear(x) # Expected result # <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[4., 4.]], dtype=float32)
Notice the input dimensions (
n_features) is set to 4 in the
x tensor and inferred automatically by the dense layer. This allows you to re-use the same code for multiple input shapes if you write your model correctly. Additionally, the output is a vector of fours because the kernel weights are all ones.
A few other points about the Dense API:
- Bias is optional but the default is to add it (
- Activation is optional. The default is linear (no activation), but you can add it by specifying the string identifier (e.g.
activation='relu') of the activation or the actual activation function (e.g.
activation=tf.keras.activations.relu). Check out Keras activations for more information.
- If input has >2 dimensions, you can think of Keras as flattening all but the last dimension, doing the original operation and then reshaping all but the last dimension back. So for example a
(2, 3, 4)tensor run through a dense layer with
10units will result in a
(2, 3, 10)output tensor.
- Initialization can be customized for weights and bias separately, but the defaults are reasonable. Check out Keras initializers for more information.
- Regularization can be applied to weights and bias separately. Check out Keras regularizers for more information.
- Constraints can be applied to weights and bias separately. Check out Keras constraints for more information.
By the way, if you need something a little more custom, check out my post on custom Keras layers.