Back to Glossary
General

Overparameterisation

Overparameterization refers to a model having more parameters than can be justified by the amount of training data. While classically it has been seen as something to avoid, overparameterized models in deep learning often perform better than their smaller counterparts, defying traditional statistical learning theory.

Explanation

In traditional statistical learning theory, overparameterization is viewed negatively because it can lead to overfitting, where the model learns the training data too well, including its noise, and performs poorly on unseen data. However, deep learning models, particularly large neural networks, often exhibit the opposite behavior. Despite having a huge number of parameters (often exceeding the number of data points), these models generalize surprisingly well. This phenomenon is not fully understood, but some theories suggest that the high dimensionality of the parameter space allows for finding solutions that are both accurate and robust. Regularization techniques, such as dropout and weight decay, also play a crucial role in preventing overfitting in overparameterized models. The 'double descent' phenomenon describes the effect of model size on performance, where performance initially decreases as the model becomes overparameterized, but then increases again as the model size continues to grow, demonstrating the benefits of extreme overparameterization.

Related Terms