The Keras Dense Layer - Sparrow Computing

The Dense class from Keras is an implementation of the simplest neural network building block: the fully connected layer. But using it can be a little confusing because the Keras API adds a bunch of configurable functionality. This post will explain the layer to you in two sections (feel free to skip ahead):

Fully connected layers

At its core, a fully connected layer is a dot product between a data matrix and a weights matrix:

y = XW

The data will have an input shape of (batch_size, n_features) and the weight matrix will have shape (n_features, n_units). This equation is important, because it shows you that the output shape will be (batch_size, n_units).

Additionally, you typically want to add an n_units bias vector:

y = XW + b

When n_units is 1, this simplifies to linear regression and b becomes the y-intercept of the equation.

So far, this is a linear transformation. But often, you want to pass the output through an activation function to make it non-linear. For hidden layers in a neural network, this helps with stability during training. For the final layer, activation functions are often used to give the output desirable characteristics like being bounded between 0 and 1 (so they can be interpreted as probabilities).

If we let the activation be an arbitrary function, f, then the full operation can be written:

y = f(XW + b)

During training, W and b will both be learned in order to best fit the data, (X, y).

API

Initializing a dense layer in Keras is easy:

import tensorflow as tf

linear = tf.keras.layers.Dense(
    2,
    use_bias=False,
    kernel_initializer='ones',
)

In the layer above, the output dimension is 2 (referred to as n_units in the previous section), there is not a bias term and the weights (called a kernel in Keras) are initialized to ones. The only required argument is the first positional argument — the number of output units.

Once defined, you can add a dense layer to a model. But you can also apply it as a forward pass to a tensor to try it out:

x = tf.ones((1, 4))
linear(x)

# Expected result
# <tf.Tensor: shape=(1, 2), dtype=float32, numpy=array([[4., 4.]], dtype=float32)

Notice the input dimensions (n_features) is set to 4 in the x tensor and inferred automatically by the dense layer. This allows you to re-use the same code for multiple input shapes if you write your model correctly. Additionally, the output is a vector of fours because the kernel weights are all ones.

A few other points about the Dense API:

Bias is optional but the default is to add it (use_bias=True).
Activation is optional. The default is linear (no activation), but you can add it by specifying the string identifier (e.g. activation='relu') of the activation or the actual activation function (e.g. activation=tf.keras.activations.relu). Check out Keras activations for more information.
If input has >2 dimensions, you can think of Keras as flattening all but the last dimension, doing the original operation and then reshaping all but the last dimension back. So for example a (2, 3, 4) tensor run through a dense layer with 10 units will result in a (2, 3, 10) output tensor.
Initialization can be customized for weights and bias separately, but the defaults are reasonable. Check out Keras initializers for more information.
Regularization can be applied to weights and bias separately. Check out Keras regularizers for more information.
Constraints can be applied to weights and bias separately. Check out Keras constraints for more information.

By the way, if you need something a little more custom, check out my post on custom Keras layers.