RAG Poisoning

An attack where adversarial content is placed into a retrieval-augmented generation corpus so future queries retrieving keyword-matching documents pull in the attacker's content; the retrieved content carries the same authority as any other retrieved document unless the runtime distinguishes provenance.

RAG Poisoning is an attack where adversarial content is placed into a retrieval-augmented generation (RAG) corpus so that future queries retrieving keyword-matching documents pull in the attacker's content as part of the LLM's context. The retrieved content carries the same authority as any other retrieved document because the LLM does not distinguish between attacker-authored and operator-authored sources unless the runtime explicitly enforces provenance. RAG poisoning is the dominant vector inside OWASP ASI06 (Memory and Context Poisoning).

The attack works because the standard RAG pipeline — embed query, retrieve top-k documents, prepend retrieved content to the prompt — treats every document in the corpus as equally authoritative once it has been ingested. An attacker who can write to the corpus (through document upload, public-ingest endpoints, write-access compromise, or insider access) places adversarial content that will be retrieved on future queries containing the targeted keywords. Every subsequent query that retrieves the poisoned document inherits the attacker's instructions.

Why RAG Poisoning Is Persistent and Hard to Detect

RAG poisoning differs from session-bound indirect prompt injection in two key ways. It is persistent. The poisoned document lives in the corpus until it is detected and removed. Every retrieval that matches its embedding pulls it in for as long as it remains. It scales automatically across users. Every user querying the same RAG corpus is exposed to the same poison without further attacker action.

Detection is hard because the corpus is supposed to contain user-uploaded or externally-ingested content. The poisoned document looks like every other document. Without ingestion-time provenance metadata or runtime adversarial-content scanning, there is no signal that distinguishes the poison from legitimate content.

Defensive Patterns

Effective RAG poisoning defence operates at three layers. Ingestion-time provenance records who authored each document, when, and with what authority. Retrieval-time filtering weights or filters results by provenance, refusing to incorporate content from unverified sources for high-stakes queries. Cross-session corpus audits periodically scan the corpus for adversarial content (instruction-shaped tokens, recent-write spikes, suspicious authority claims) and quarantine matches for human review.

For Web3 deployments specifically, RAG corpora that influence transaction advice (verified contracts, approved tokens, audited DEX routes) should be treated as high-stakes — every retrieval-influenced transaction should require explicit human confirmation regardless of what the corpus advises. For deeper guidance, see the OWASP ASI06 explainer.

Need expert guidance on RAG Poisoning?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote