MasterAI Agents

Explanation

RLHF is a multi-stage process used to train large language models (LLMs) and other AI systems. It typically involves three steps: pre-training a model on a large dataset, collecting human-labeled data where humans rank or rate model outputs, training a reward model to predict these human preferences, and finally using reinforcement learning (often PPO) to optimize the original model against the reward model. This process helps reduce hallucinations, improve safety, and ensure the AI follows instructions more effectively than standard supervised learning alone.

Reinforcement Learning from Human Feedback (RLHF)

Explanation

Related Terms