The Keras Dense Layer

Ben Cook • Posted 2020-12-31 • Last updated 2021-03-24

The Dense class from Keras is an implementation of the simplest neural network building block: the fully connected layer. But using it can be a little confusing because the Keras API adds a bunch of configurable functionality. This post will explain the layer to you in two sections (feel free to skip ahead):

Fully connected layers

At its core, a fully connected layer is a dot product between a data matrix and a weights matrix:

y = XW

The data will have an input shape of (batch_size, n_features) and the weight matrix will have shape (n_features, n_units). This equation is important, because it shows you that the output shape will be (batch_size, n_units).

Additionally, you typically want to add an n_units bias vector:

y = XW + b

When n_units is 1, this simplifies to linear regression and b becomes the y-intercept of the equation.

So far, this is a linear transformation. But often, you want to pass the output through an activation function to make it non-linear. For hidden layers in a neural network, this helps with stability during training. For the final layer, activation functions are often used to give the output desirable characteristics like being bounded between 0 and 1 (so they can be interpreted as probabilities).

If we let the activation be an arbitrary function, f, then the full operation can be written:

y = f(XW + b)

During training, W and b will both be learned in order to best fit the data, (X, y).

API

Initializing a dense layer in Keras is easy:

import tensorflow as tf

linear = tf.keras.layers.Dense(
    2,
    use_bias=False,
    kernel_initializer='ones',
)

In the layer above, the output dimension is 2 (referred to as n_units in the previous section), there is not a bias term and the weights (called a kernel in Keras) are initialized to ones. The only required argument is the first positional argument — the number of output units.

Once defined, you can add a dense layer to a model. But you can also apply it as a forward pass to a tensor to try it out:

x = tf.ones((1, 4))
linear(x)

# Expected result
# <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[4., 4.]], dtype=float32)

Notice the input dimensions (n_features) is set to 4 in the x tensor and inferred automatically by the dense layer. This allows you to re-use the same code for multiple input shapes if you write your model correctly. Additionally, the output is a vector of fours because the kernel weights are all ones.

A few other points about the Dense API:

  • Bias is optional but the default is to add it (use_bias=True).
  • Activation is optional. The default is linear (no activation), but you can add it by specifying the string identifier (e.g. activation='relu') of the activation or the actual activation function (e.g. activation=tf.keras.activations.relu). Check out Keras activations for more information.
  • If input has >2 dimensions, you can think of Keras as flattening all but the last dimension, doing the original operation and then reshaping all but the last dimension back. So for example a (2, 3, 4) tensor run through a dense layer with 10 units will result in a (2, 3, 10) output tensor.
  • Initialization can be customized for weights and bias separately, but the defaults are reasonable. Check out Keras initializers for more information.
  • Regularization can be applied to weights and bias separately. Check out Keras regularizers for more information.
  • Constraints can be applied to weights and bias separately. Check out Keras constraints for more information.

By the way, if you need something a little more custom, check out my post on custom Keras layers.