Back to Glossary
LLMs

Bidirectional encoder representations from transformers (BERT)

Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based machine learning model for natural language processing (NLP). BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both the left and right context in all layers.

Explanation

BERT's key innovation lies in its bidirectional training. Unlike previous models that read text input sequentially (left-to-right or right-to-left), BERT considers the entire sequence at once. This is achieved through two unsupervised pre-training methods: Masked Language Modeling (MLM) and Next Sentence Prediction (NSP). MLM randomly masks some of the input tokens and the model learns to predict the masked words based on the surrounding context. NSP trains the model to understand the relationship between sentences by predicting whether two given sentences are consecutive in the original document. The architecture consists of multiple layers of Transformer encoders. Pre-trained BERT models can be fine-tuned for specific downstream tasks like question answering, sentiment analysis, and text classification, often achieving state-of-the-art results with minimal task-specific modifications. BERT's ability to capture contextual relationships and bidirectional understanding has significantly advanced NLP.

Related Terms