Back to Glossary
Neural Networks

Batch normalisation

Batch normalization is a technique used in neural networks to improve training speed and stability. It normalizes the activations of a layer for each mini-batch, scaling and shifting the data to have a mean of zero and a standard deviation of one.

Explanation

Batch normalization (BatchNorm) addresses the internal covariate shift problem, where the distribution of network activations changes during training due to changing parameters in preceding layers. This shift can slow down training as subsequent layers need to adapt to the new distribution. BatchNorm mitigates this by normalizing each layer's input. Specifically, for each mini-batch, BatchNorm calculates the mean and variance of the activations. It then normalizes the activations by subtracting the mean and dividing by the standard deviation. To maintain representational capacity, BatchNorm introduces two learnable parameters, gamma (scale) and beta (shift), allowing the network to learn the optimal scaling and shifting of the normalized activations. BatchNorm is typically applied after a linear transformation (e.g., a fully connected or convolutional layer) and before the activation function. It allows higher learning rates, makes the network less sensitive to initialization, and can act as a regularizer.

Related Terms