Agentjacking

A security pattern where attacker-written operational data is returned to an AI agent as if it were trusted diagnostic or system guidance, steering later tool use or execution.

Agentjacking is the security pattern where an attacker plants content inside an operational system that an AI agent later treats as trustworthy context. The important detail is that the attacker does not need to compromise the agent directly. Instead, they write into a data source that the agent is allowed to read, such as telemetry, issue queues, support systems, alerts, or other workflow tools, and rely on the agent to convert that content into action.

The term became prominent in 2026 research around AI coding agents querying Sentry issues through Model Context Protocol (MCP) integrations. In that pattern, attacker-written crash or issue content could be returned through a trusted wrapper and interpreted by the model as if it were legitimate remediation guidance. Once the agent accepts the content as trustworthy, the path to shell execution, code modification, secret exposure, or CI mutation can be short.

This matters because the core failure is not only prompt injection. It is a collapsed trust boundary. The system fails to preserve the difference between vendor-generated diagnostics, internal analyst notes, and outsider-controlled text. If retrieval strips provenance, the agent receives one blended context object and has no reliable way to distinguish data from attacker intent.

From an audit perspective, agentjacking is best analyzed with the prompt-to-sink lens. The key questions are: who can write into the upstream system, how is that content labeled when retrieved, what memories or summaries persist it, and which execution sinks the resulting instructions can reach. In coding agents, the sinks may be shell commands, file writes, package installs, or pull requests. In long-lived agents, the same content may persist through memory injection and resurface later under stronger trust.

The practical lesson is simple: operational data is not automatically trusted data. If outsiders can write it, the agent should not be allowed to treat it as execution guidance without provenance-aware controls, sink-time validation, and strong runtime constraints.

Need expert guidance on Agentjacking?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote

Agentjacking

Related Terms

Prompt-to-Sink

Trust Boundary

Model Context Protocol (MCP)

Memory injection

Tool Misuse

Need expert guidance on Agentjacking?