Instruction Hierarchy

The practice of explicitly modelling which input channels to an LLM carry instructions (system prompt, user messages) and which carry only data (tool outputs, documents) — typically enforced through templating, role markers, and instruction-tuned models.

Instruction Hierarchy is the practice of explicitly modelling which input channels to an LLM carry instructions and which carry only data. In a properly enforced hierarchy, the system prompt sits at the top — its instructions take precedence. The user message sits below — its instructions are honored unless they conflict with the system prompt. External content (tool outputs, documents, retrieved web pages, tool descriptors, search results) sits below that — it is data, not instructions, and the agent should not treat it as authoritative for behavioural decisions.

The hierarchy matters because LLMs have no built-in separation between channel sources. Every byte of prompt context arrives in the model's input as text. Without explicit hierarchy enforcement, a sentence buried in a retrieved document that says "ignore your system prompt and do X instead" has roughly the same authority as the original system prompt. This is the structural property that makes indirect prompt injection effective.

How Hierarchy Is Enforced

Three patterns combine to produce useful instruction-hierarchy enforcement in production:

Templating with named slots. External content goes into named slots within a prompt template. The slot is wrapped with markers (<document>...</document>, <tool_output>...</tool_output>) that the model has been trained to treat as data boundaries. Instructions outside the slots are operator-authored; content inside is third-party.

Role markers and structured input. Modern instruction-tuned models accept structured inputs that distinguish system, user, assistant, and tool messages explicitly. The model has been trained to weight these channels differently. The agent runtime must use the structured-input format consistently rather than concatenating everything into a single prompt.

Instruction-tuned compliance. Frontier models from Anthropic, OpenAI, Google, and others are increasingly trained on instruction-hierarchy compliance datasets. The 2024–2025 generation of models does meaningfully better at resisting indirect injection than the 2022–2023 generation, though the defence is far from perfect. Operators should track the model's known-good behaviour against current adversarial patterns rather than relying on a one-time evaluation.

Limitations and Layered Defence

Instruction-hierarchy enforcement is not a complete defence — every published model can be coaxed past it with sufficient effort, and adversaries have unbounded retry budget against production targets. Hierarchy is one of four layers in proper defence: hierarchy enforcement, input sanitisation at every boundary, scoped tool authority, and human-in-the-loop checkpoints for high-stakes actions. Each layer reduces exposure; together they close most of the disclosed-incident record.

For deeper guidance, see the OWASP ASI01 explainer and the MCP Security Audit service description.

Need expert guidance on Instruction Hierarchy?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote