State Poisoning
Gradual corruption of an AI agent's persistent memory across sessions through statistically imperceptible data manipulation.
State poisoning (also called agentic memory poisoning or gradual state corruption) is an adversarial attack targeting AI systems that maintain persistent memory across sessions. Unlike training poisoning, which corrupts the model during development, state poisoning operates at runtime — manipulating the agent's accumulated context, preferences, and learned behaviors through a sustained sequence of interactions that individually appear completely legitimate.
The attack is structurally distinct from prompt injection because no single interaction contains a malicious payload. Each data point falls within expected statistical variance. The adversarial signal is distributed across dozens or hundreds of sessions, making it invisible to input sanitization, single-point anomaly detection, and even cryptographic memory integrity verification (which confirms storage integrity but not content validity).
How state poisoning works
The attack follows a multi-phase pattern:
Phase 1 — Baseline establishment: The adversary interacts with the agentic AI system using entirely legitimate, verifiable data. Every interaction is indistinguishable from normal usage. The agent's trust calibration registers this entity as reliable.
Phase 2 — Incremental bias injection: The adversary introduces analyses or data with statistically imperceptible distortions — biased by amounts that fall within expected variance. The agent's continuous learning loop integrates these inputs into its persistent memory and strategy representation without triggering any alert.
Phase 3 — Strategy drift consolidation: The accumulated bias reaches a threshold where the agent's autonomous decision-making systematically favors outcomes benefiting the attacker. The agent has never received an explicit malicious command. Its reasoning chain produces coherent justifications rooted in its corrupted historical context.
Phase 4 — Exploitation: The attacker takes positions that profit from the agent's now-predictable, biased behavior.
Why state poisoning is uniquely dangerous
State poisoning is considered the highest-severity, lowest-visibility threat in autonomous AI security because:
- No discrete detection surface: Unlike adversarial perturbations (which leave statistical fingerprints) or prompt injection (which requires a parseable payload), state poisoning has no single observable artifact
- Functional conflict with security: Memory isolation between sessions only works as a full state reset, which destroys the continuous learning that makes agentic AI valuable
- Self-consistent corruption: The agent cannot distinguish its poisoned state from normal operation — its reasoning chain appears internally coherent
Mitigation
The primary defense is strategy-drift detection against an immutable, cryptographically signed baseline strategy profile. This approach compares the agent's current reasoning embeddings against a human-audited reference using distributional distance metrics, triggering mandatory human review when drift exceeds defined thresholds.
Articles Using This Term
Learn more about State Poisoning in these articles:
Related Terms
Training Poisoning
Attack inserting malicious data into AI training sets to corrupt model behavior and predictions.
Agentic AI
AI systems that autonomously take actions in the real world, including executing commands, managing files, and interacting with external services.
Prompt Injection
Attack technique manipulating AI system inputs to bypass safety controls or extract unauthorized information.
Strategy Drift
Undetected behavioral shift in an AI agent's decision-making away from its intended strategy baseline.
Need expert guidance on State Poisoning?
Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.
Get a Quote

