MasterAI Agents

In deep neural networks, if the initial weights are too small, the signal shrinks as it passes through each layer, leading to vanishing gradients and slow learning. Conversely, if the initial weights are too large, the signal grows exponentially, leading to exploding gradients and unstable training. Xavier initialization, also known as Glorot initialization, addresses this by setting the weights such that the variance of the activations remains approximately constant across layers. Specifically, for a layer with 'n_in' input neurons and 'n_out' output neurons, the weights are typically drawn from a uniform distribution U(-sqrt(6) / sqrt(n_in + n_out), sqrt(6) / sqrt(n_in + n_out)) or a normal distribution with a mean of 0 and a standard deviation of sqrt(2 / (n_in + n_out)). This initialization scheme is most effective when the activation functions used are linear or approximately linear (e.g., tanh, but not ReLU without modifications). Variations like He initialization are preferred for ReLU activations.

Xavier initialisation

Explanation

Related Terms