Agentjacking
A security pattern where attacker-written operational data is returned to an AI agent as if it were trusted diagnostic or system guidance, steering later tool use or execution.
Agentjacking is the security pattern where an attacker plants content inside an operational system that an AI agent later treats as trustworthy context. The important detail is that the attacker does not need to compromise the agent directly. Instead, they write into a data source that the agent is allowed to read, such as telemetry, issue queues, support systems, alerts, or other workflow tools, and rely on the agent to convert that content into action.
The term became prominent in 2026 research around AI coding agents querying Sentry issues through Model Context Protocol (MCP) integrations. In that pattern, attacker-written crash or issue content could be returned through a trusted wrapper and interpreted by the model as if it were legitimate remediation guidance. Once the agent accepts the content as trustworthy, the path to shell execution, code modification, secret exposure, or CI mutation can be short.
This matters because the core failure is not only prompt injection. It is a collapsed trust boundary. The system fails to preserve the difference between vendor-generated diagnostics, internal analyst notes, and outsider-controlled text. If retrieval strips provenance, the agent receives one blended context object and has no reliable way to distinguish data from attacker intent.
From an audit perspective, agentjacking is best analyzed with the prompt-to-sink lens. The key questions are: who can write into the upstream system, how is that content labeled when retrieved, what memories or summaries persist it, and which execution sinks the resulting instructions can reach. In coding agents, the sinks may be shell commands, file writes, package installs, or pull requests. In long-lived agents, the same content may persist through memory injection and resurface later under stronger trust.
The practical lesson is simple: operational data is not automatically trusted data. If outsiders can write it, the agent should not be allowed to treat it as execution guidance without provenance-aware controls, sink-time validation, and strong runtime constraints.
Related Terms
Prompt-to-Sink
The end-to-end path from attacker-influenced prompt or context input to the final execution sink where the AI system can cause a real side effect.
Trust Boundary
Interface where data enters protocol or assets move between components, representing highest-risk areas requiring focused security analysis.
Model Context Protocol (MCP)
Open standard defining how AI agents communicate with external tools, databases, and services through a unified interface for LLM-to-infrastructure interaction.
Memory injection
An attack where a malicious instruction is written into an AI agent's persistent memory store, causing it to survive across sessions and execute later as if it were the agent's own trusted context.
Tool Misuse
The runtime use of an AI agent's tools in unintended, unsafe, or attacker-directed ways — through over-privilege, descriptor ambiguity, or unsafe composition. The class OWASP ASI02 covers.
Need expert guidance on Agentjacking?
Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.
Get a Quote