Machine Learning Fundamentals
Bias (variance)
In machine learning, bias refers to the error introduced by approximating a real-world problem, which is often complex, by a simplified model. Variance, on the other hand, refers to the model's sensitivity to small fluctuations in the training data; high variance means the model fits the training data very well but performs poorly on unseen data.
Explanation
Bias and variance are two primary sources of error in supervised learning algorithms. High bias can cause a model to underfit the data, meaning it fails to capture the underlying relationships and performs poorly on both the training and test sets. This often results from using a model that is too simple for the complexity of the data. High variance, conversely, leads to overfitting, where the model learns the noise in the training data along with the underlying patterns. This causes the model to perform well on the training data but poorly on new, unseen data. Addressing bias and variance often involves techniques like using more complex models (to reduce bias), regularization (to reduce variance), or increasing the amount of training data. There is often a trade-off between bias and variance; reducing one can increase the other. Model selection techniques such as cross-validation are used to find the optimal balance.