MasterAI Agents

Prompt injection exploits the LLM's ability to follow instructions provided in the prompt. Attackers craft prompts that override or subvert the original instructions the model was designed to follow. This can lead to several outcomes, including: bypassing safety filters (e.g., generating harmful content), revealing internal system information (e.g., model architecture, training data), performing unauthorized actions (e.g., sending emails, accessing restricted data), and spreading misinformation (e.g., generating fabricated news articles). There are two primary categories of prompt injection: **direct injection**, where the malicious prompt is directly input by the attacker, and **indirect injection**, where the malicious prompt is embedded in external data sources (e.g., websites, documents) that the LLM processes. Defenses against prompt injection include input sanitization, prompt hardening (making the original prompt more resistant to manipulation), output monitoring, and the use of specialized security models to detect and block malicious prompts.

Prompt injection

Explanation

Related Terms