OWASP LLM01:2025 - Prompt Injection Number One on the OWASP LLM Top 10

AI Team
owasp stochastic prompt-injection
OWASP LLM01:2025 - Prompt Injection Number One on the OWASP LLM Top 10

A Prompt Injection Vulnerability occurs when user prompts alter the LLM’s behavior or output in unintended ways. These inputs can affect the model even if they are imperceptible to humans, therefore prompt injections do not need to be human-visible/readable, as long as the content is parsed by the model. - OWASP LLM01:2025 Prompt Injection

Prompt Injection is a Big Problem

Recently OWASP put out its top 10 list for LLM security risks, and prompt injection is number one on the list.

Number one!

Stochastic Behavior

Given the stochastic influence at the heart of the way models work, it is unclear if there are fool-proof methods of prevention for prompt injection - OWASP LLM01:2025 Prompt Injection

As highlighted by OWASP, the probabilistic foundation of these models means there likely exists no absolute, fool-proof prevention method. This reality necessitates adopting a defense-in-depth approach, where multiple protective layers work in concert to safeguard AI systems. Rather than relying on a single barrier, organizations must implement overlapping protective measures.

This would include PID:one’s Prompt Injection Detection rules.

Possible Mitigations

As can be seen below, there are a number of mitigations that can be applied to prevent prompt injection. Managing the input prompt is a key part of this, and this is where PID:one’s Prompt Injection Detection rules come into play.

Organisations building AI applications and agents will need to implement all of these mitigations, and potentially more as new attack vectors are discovered.

Mitigation Description
Implement input and output filtering Define sensitive categories and construct rules for identifying and handling such content. Apply semantic filters and use string-checking to scan for non-allowed content. Evaluate responses using the RAG Triad: Assess context relevance, groundedness, and question/answer relevance to identify potentially malicious outputs.
Constrain model behavior Provide specific instructions about the model’s role, capabilities, and limitations within the system prompt. Enforce strict context adherence, limit responses to specific tasks or topics, and instruct the model to ignore attempts to modify core instructions.
Define and validate expected output formats Specify clear output formats, request detailed reasoning and source citations, and use deterministic code to validate adherence to these formats.
Enforce privilege control and least privilege access Provide the application with its own API tokens for extensible functionality, and handle these functions in code rather than providing them to the model. Restrict the model’s access privileges to the minimum necessary for its intended operations.
Require human approval for high-risk actions Implement human-in-the-loop controls for privileged operations to prevent unauthorized actions.
Segregate and identify external content Separate and clearly denote untrusted content to limit its influence on user prompts.
Conduct adversarial testing and attack simulations Perform regular penetration testing and breach simulations, treating the model as an untrusted user to test the effectiveness of trust boundaries and access controls.

First Line of Defense

One of the first lines of defence is to manage the prompt. This is where PID:one’s Prompt Injection Detection rules come in. We can help you detect malicious prompts before they reach the LLM and act as part of a strong defence in depth strategy.

Further Reading

Ready to defend?

AI needs a first line
of defense. Let's go!

Request a demo →