Memory Poisoning

An attack where adversaries corrupt entries in an AI agent's persistent memory store (preferences, summaries, learned facts) to bias future reasoning across sessions. The corruption persists until detected, biasing every retrieval that touches the poisoned entries.

Memory Poisoning is an attack where adversaries corrupt entries in an AI agent's persistent memory store — user preferences, task-context summaries, learned facts, accumulated knowledge — to bias future reasoning across sessions. Like RAG poisoning, it is one of the principal failure modes inside OWASP ASI06 (Memory and Context Poisoning), but the attack surface is different: memory poisoning targets the agent's own state, not the external knowledge corpus.

The attack works because most agent runtimes treat memory entries as authoritative context. When the agent reads from memory at the start of a new session or task, the contents are loaded into the prompt with the same trust level as the system prompt itself. An attacker who can write to memory — through compromised session, indirect-injection-to-memory chaining, or direct API access — inserts entries that bias future retrievals. The corruption lives in the memory store until detected, biasing every retrieval that touches the poisoned entries.

Why Memory Poisoning Is Especially Dangerous

Three properties compound the risk. Memory writes are typically autonomous. Agents write to memory based on their own reasoning ("the user prefers X, I'll remember that"); the operator does not authorise each write. Memory entries lack provenance metadata in most production implementations. Once written, the entry looks like every other memory entry. Memory persists across sessions and often across users in shared-memory architectures. A single successful poison influences many future invocations.

The chaining pattern is particularly insidious: indirect prompt injection in a document the agent reads → instruction to write to memory → agent writes attacker-chosen entry → all future sessions retrieve the poisoned entry → every future task is biased. The attack window is one document; the impact is unbounded sessions.

Defensive Patterns

The structurally sound defences operate at the write boundary. Reject autonomous memory writes — the agent reads from memory but does not write to it without explicit authorisation. Audit memory writes when they are permitted — record the context that produced the write, the authority that authorised it, and an explicit user-approval path for any write that influences high-stakes decisions. Provenance-tag every memory entry at write time so retrieval can weight or filter entries by their origin and authority.

For Web3 deployments, the rule is unconditional: agent memory entries that influence transactions, signing, or infrastructure changes must require explicit per-write user authorisation. Treating "the agent learned this last session" as ongoing authorisation is a memory-poisoning-driven fund-loss primitive. For deeper guidance, see the OWASP ASI06 explainer.

Articles Using This Term

Learn more about Memory Poisoning in these articles:

OWASP ASI06 Explained: AI Memory & Context Poisoning

OWASP ASI06 (Memory and Context Poisoning) explained: RAG corruption, vector store attacks, persistent context bias. How to defend AI agent memory layers.

Jun 16, 2026•11 min read

→

Related Terms

RAG Poisoning

An attack where adversarial content is placed into a retrieval-augmented generation corpus so future queries retrieving keyword-matching documents pull in the attacker's content; the retrieved content carries the same authority as any other retrieved document unless the runtime distinguishes provenance.

Context-Window Saturation

An attack where adversarial content with high relevance and high volume displaces legitimate instructions or system prompts from the agent's finite context window, reducing model adherence and increasing susceptibility to subsequent injection.

Agent Goal Hijack

The threat class OWASP ASI01 covers: any attack that redirects an AI agent's current task or planning objective through adversarial content in the prompt context, regardless of which input channel the content arrives through.

Indirect Prompt Injection

Attack class where adversarial instructions are hidden inside external content (READMEs, tool descriptions, RPC responses, social media replies) that an AI agent ingests during normal operation, causing it to execute attacker-chosen actions without the user issuing the command.

Need expert guidance on Memory Poisoning?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote