All Threats
38 Attack Vectors
Prompt Manipulation
Attacks where an adversary alters or overrides the model's intended behavior by embedding instructions in user inputs or external data sources. This includes jailbreak attempts that bypass safeguards, prompt injections (direct or indirect) that force unauthorized actions, and hidden instructions concealed within comments, code blocks, or other structures.
Prompt Manipulation
7 Attack Vectors
Abusing Legitimate Functions
Attacks that exploit the model's intended capabilities to perform harmful or malicious tasks. Examples include generating disinformation or influence content, assisting in malware development or obfuscation, extracting sensitive information through crafted prompts, or enabling social engineering and fraud. These attacks do not necessarily bypass restrictions but instead misuse the model's normal functions for adversarial goals.
Function Abuse
14 Attack Vectors
Suspicious Patterns
Prompt structures that signal malicious intent or evasion attempts. These patterns often hide instructions through obfuscation, Unicode manipulation, or token perturbation. They may also involve chained injections where outputs are re-used to escalate control, or fragmented instructions spread across multiple prompts to bypass filters and detection mechanisms.
Suspicious Patterns
9 Attack Vectors
Abnormal Outputs
Model responses that indicate compromise or unintended disclosure. This includes leaking the system prompt, internal policies, or reasoning; exposing credentials, API keys, or personal data; and producing harmful or illegal content such as exploit code, misinformation, or guidance for attacks. Abnormal outputs serve as observable indicators that the system has been manipulated or misused.
Abnormal Outputs
8 Attack Vectors
All Threats