Agent Goal Hijack

The threat class OWASP ASI01 covers: any attack that redirects an AI agent's current task or planning objective through adversarial content in the prompt context, regardless of which input channel the content arrives through.

Agent Goal Hijack is the threat class that OWASP ASI01 covers — any attack that redirects an AI agent's current task or planning objective through adversarial content in the prompt context. The redirection can be subtle (the agent quietly diverts a portion of the task to attacker-chosen actions while completing the rest as expected) or overt (the agent abandons the original task entirely). What unifies the class is the mechanism: adversarial content in some input channel competes with the operator's intended objective for the agent's attention, and the LLM has no built-in mechanism to reliably resolve the competition in favour of the operator.

Goal hijacking is the gateway threat in agentic-AI security. A successful hijack alone is contained — the agent does the wrong thing for one task. A hijack combined with reachable exec primitives, credential-bearing tools, supply-chain compromise, or transaction-signing authority produces full system impact. Most disclosed CVEs in the MCP ecosystem map to ASI01 plus at least one downstream OWASP item, because successful exploitation requires both the redirection (ASI01) and the privileged subsystem the redirected agent then reaches.

How Goal Hijacks Compose Into Bigger Attacks

The composition pattern is consistent across the disclosed-incident record. Step one: the agent reads adversarial content from somewhere (a user message, a document, a tool output, a tool descriptor). Step two: the content's instructions redirect the agent's plan toward attacker-chosen actions. Step three: the redirected agent reaches a privileged subsystem — exec, credentials, signing — through tools the operator granted at install time. Step four: the privileged subsystem produces the actual impact (RCE, data exfiltration, fund loss, etc.).

The May 2025 GitHub MCP incident followed this pattern exactly: indirect prompt injection in a crafted issue body redirected the agent's task, and the redirected agent used its connected Git tools (which the operator had granted) to leak private-repo contents to the attacker. CVE-2025-54136 ("MCPoison") followed the same pattern through descriptor mutation. The April 2026 Anthropic SDK configuration-channel injection cluster follows it through configuration mutation. The pattern is structural; the specific channel of redirection varies by case.

Defensive Posture

Defending against goal hijacks requires layered defence rather than any single control: instruction hierarchy enforcement, content sanitisation at every input boundary, scoped tool authority, and explicit human-in-the-loop checkpoints for high-stakes actions. The OWASP ASI01 explainer walks through each layer in operational detail, and the MCP Security Checklist operationalises the controls per check.

Need expert guidance on Agent Goal Hijack?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote