The Dense class from Keras is an implementation of the simplest neural network building block: the fully connected layer. But using it can be a little confusing because the Keras API adds a bunch of configurable functionality. This post will explain the layer to you in two sections (feel free to skip ahead):
Fully connected layers
At its core, a fully connected layer is a dot product between a data matrix and a weights matrix:
y = XW
The data will have an input shape of (batch_size, n_features)
and the weight matrix will have shape (n_features, n_units)
. This equation is important, because it shows you that the output shape will be (batch_size, n_units)
.
Additionally, you typically want to add an n_units
bias vector:
y = XW + b
When n_units
is 1, this simplifies to linear regression and b
becomes the y-intercept of the equation.
So far, this is a linear transformation. But often, you want to pass the output through an activation function to make it non-linear. For hidden layers in a neural network, this helps with stability during training. For the final layer, activation functions are often used to give the output desirable characteristics like being bounded between 0 and 1 (so they can be interpreted as probabilities).
If we let the activation be an arbitrary function, f
, then the full operation can be written:
y = f(XW + b)
During training, W
and b
will both be learned in order to best fit the data, (X, y)
.
API
Initializing a dense layer in Keras is easy:
import tensorflow as tf
linear = tf.keras.layers.Dense(
2,
use_bias=False,
kernel_initializer='ones',
)
In the layer above, the output dimension is 2 (referred to as n_units
in the previous section), there is not a bias term and the weights (called a kernel in Keras) are initialized to ones. The only required argument is the first positional argument — the number of output units.
Once defined, you can add a dense layer to a model. But you can also apply it as a forward pass to a tensor to try it out:
x = tf.ones((1, 4))
linear(x)
# Expected result
# <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[4., 4.]], dtype=float32)
Notice the input dimensions (n_features
) is set to 4 in the x
tensor and inferred automatically by the dense layer. This allows you to re-use the same code for multiple input shapes if you write your model correctly. Additionally, the output is a vector of fours because the kernel weights are all ones.
A few other points about the Dense API:
- Bias is optional but the default is to add it (
use_bias=True
). - Activation is optional. The default is linear (no activation), but you can add it by specifying the string identifier (e.g.
activation='relu'
) of the activation or the actual activation function (e.g.activation=tf.keras.activations.relu
). Check out Keras activations for more information. - If input has >2 dimensions, you can think of Keras as flattening all but the last dimension, doing the original operation and then reshaping all but the last dimension back. So for example a
(2, 3, 4)
tensor run through a dense layer with10
units will result in a(2, 3, 10)
output tensor. - Initialization can be customized for weights and bias separately, but the defaults are reasonable. Check out Keras initializers for more information.
- Regularization can be applied to weights and bias separately. Check out Keras regularizers for more information.
- Constraints can be applied to weights and bias separately. Check out Keras constraints for more information.
By the way, if you need something a little more custom, check out my post on custom Keras layers.