Memory Poisoning

An attack where adversaries corrupt entries in an AI agent's persistent memory store (preferences, summaries, learned facts) to bias future reasoning across sessions. The corruption persists until detected, biasing every retrieval that touches the poisoned entries.

Memory Poisoning is an attack where adversaries corrupt entries in an AI agent's persistent memory store — user preferences, task-context summaries, learned facts, accumulated knowledge — to bias future reasoning across sessions. Like RAG poisoning, it is one of the principal failure modes inside OWASP ASI06 (Memory and Context Poisoning), but the attack surface is different: memory poisoning targets the agent's own state, not the external knowledge corpus.

The attack works because most agent runtimes treat memory entries as authoritative context. When the agent reads from memory at the start of a new session or task, the contents are loaded into the prompt with the same trust level as the system prompt itself. An attacker who can write to memory — through compromised session, indirect-injection-to-memory chaining, or direct API access — inserts entries that bias future retrievals. The corruption lives in the memory store until detected, biasing every retrieval that touches the poisoned entries.

Why Memory Poisoning Is Especially Dangerous

Three properties compound the risk. Memory writes are typically autonomous. Agents write to memory based on their own reasoning ("the user prefers X, I'll remember that"); the operator does not authorise each write. Memory entries lack provenance metadata in most production implementations. Once written, the entry looks like every other memory entry. Memory persists across sessions and often across users in shared-memory architectures. A single successful poison influences many future invocations.

The chaining pattern is particularly insidious: indirect prompt injection in a document the agent reads → instruction to write to memory → agent writes attacker-chosen entry → all future sessions retrieve the poisoned entry → every future task is biased. The attack window is one document; the impact is unbounded sessions.

Defensive Patterns

The structurally sound defences operate at the write boundary. Reject autonomous memory writes — the agent reads from memory but does not write to it without explicit authorisation. Audit memory writes when they are permitted — record the context that produced the write, the authority that authorised it, and an explicit user-approval path for any write that influences high-stakes decisions. Provenance-tag every memory entry at write time so retrieval can weight or filter entries by their origin and authority.

For Web3 deployments, the rule is unconditional: agent memory entries that influence transactions, signing, or infrastructure changes must require explicit per-write user authorisation. Treating "the agent learned this last session" as ongoing authorisation is a memory-poisoning-driven fund-loss primitive. For deeper guidance, see the OWASP ASI06 explainer.

Need expert guidance on Memory Poisoning?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote