Back to Glossary
Optimization

c gradient descent (SGD)

Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used to find the minimum of a function. In the context of machine learning, this function represents the model's error (loss), and the algorithm adjusts the model's parameters (weights) to reduce this error based on a single or a small subset of data points at each iteration.

Explanation

Gradient Descent is a fundamental optimization algorithm that iteratively adjusts model parameters to minimize a loss function. Standard Gradient Descent calculates the gradient of the loss function using the entire training dataset in each iteration, making it computationally expensive for large datasets. SGD addresses this by approximating the gradient using only one or a small batch of data points (a mini-batch) randomly selected from the training set in each iteration. This introduces more noise into the optimization process, but it significantly speeds up computation per iteration. While the noisy updates can lead to oscillations, they often help the algorithm escape local minima and converge faster to a good solution, especially in high-dimensional spaces. The learning rate, a hyperparameter, controls the size of the steps taken during each parameter update. Careful tuning of the learning rate is crucial for the success of SGD. Variants of SGD, such as SGD with momentum and Adam, have been developed to further improve convergence and stability.

Related Terms