Natural Language Processing
tive sampling
Negative sampling is a technique used in training word embeddings, particularly with models like Word2Vec, to approximate the softmax function. Instead of updating all weights in the neural network during each training iteration, it only updates a small subset of weights, significantly reducing the computational cost.
Explanation
In traditional neural networks, especially when dealing with large vocabularies, the softmax function requires calculating the probability of each word in the vocabulary being the correct context word, which is computationally expensive. Negative sampling addresses this by converting the multi-class classification problem into a binary classification problem. For each observed (word, context) pair, which is treated as a positive sample, the algorithm randomly samples a few other words from the vocabulary to act as negative samples. The model then learns to distinguish the positive sample (the actual context word) from these negative samples (randomly chosen words). This significantly reduces the computational burden, as only the weights associated with the positive sample and the small set of negative samples are updated in each iteration. The number of negative samples is a hyperparameter that needs to be tuned. The objective is to maximize the probability of the positive sample while minimizing the probability of the negative samples. This method allows for efficient training of word embeddings on very large datasets.