OWASP ASI09 Explained: Human-Agent Trust Exploitation

TL;DR

OWASP ASI09 ("Human-Agent Trust Exploitation") is item 9 of the OWASP Top 10 for Agentic Applications 2026. It is the only item in the standard that names the human side of the loop as the attack surface — not the agent's code or context, but the user's reasoning about what to approve.
ASI09 fires through three principal vectors: anthropomorphism abuse (agents leveraging the human tendency to extend social trust to systems that feel personable), authority bias (agents framing themselves as more authoritative than they are), and confirmation fatigue (agents wearing down user vigilance through repetitive approval prompts that become reflexive).
This is the threat class that turns a successful prompt injection into a successful real-world action even when human-in-the-loop checkpoints are in place — because the human approves the wrong thing.
For Web3 deployments specifically, ASI09 is the failure mode behind agent-driven phishing for transaction approvals: the agent presents a malicious transaction with framing that triggers approval, and the user signs.
Mitigation requires friction-by-design for high-stakes decisions, structured presentation that surfaces the actual effect, and rate-limiting on approvals — not just checkboxes that confirm "yes, I approve."

What ASI09 actually says

OWASP ASI09 names a class of attack that classical software security has historically ignored: the human user themselves is the exploited surface. The attack does not bypass technical controls; it persuades the user to disable them.

OWASP's ASI09 centers on authority misrepresentation, misleading explanations, over-confidence projection, and responsibility diffusion. In practice these surface through three psychological mechanisms that AI agents are uniquely positioned to exploit:

Anthropomorphism abuse. Humans extend more trust to systems that present as conversational, personable, or intentional. AI agents present this way by default — they speak in first person, express plans and reasoning, acknowledge errors, apologise, suggest alternatives. The trust extension is automatic and below conscious review for most users. An adversary who can make the agent's framing more anthropomorphic ("I noticed you forgot to authorise this — let me handle it") increases approval rates without changing what they are asking for.

Authority bias. Humans defer to perceived authority. AI agents framed as "expert" or "official" or "verified" are obeyed more readily than the same advice from an unframed source. An attack that compromises the agent's framing — through tool descriptors, retrieved content, or agent self-presentation — can promote attacker-chosen authority signals into the user's view.

Confirmation fatigue. Humans habituate to repetitive prompts. After the third "approve transaction?" dialog of a session, users click through reflexively. Agents that flood approval surfaces — whether by design or as an attack — degrade user vigilance, making subsequent malicious prompts more likely to be approved.

Why ASI09 is the standard's most counterintuitive item

ASI09 is the OWASP item most likely to be dismissed by technical reviewers ("that's a UX problem, not a security problem"). The dismissal is wrong. Three reasons the standard names this threat class explicitly:

It is the failure mode behind successful real-world agent attacks. Prompt injection that the agent's runtime would otherwise catch can succeed if the runtime's catch is "ask the user" and the user approves. Tool poisoning that produces a malicious-looking transaction can succeed if the user's view of the transaction is engineered to make it look benign.

It scales asymmetrically with agent capability. As agents become more capable, the consequences of any single approval grow. The 100th transaction approval today carries vastly more impact than the 100th approval would have a year ago, because each transaction can be a cross-chain swap, a multi-step contract interaction, or a complex DeFi position.

It cannot be patched in the model. Anthropomorphism, authority bias, and confirmation fatigue are properties of the human user, not the agent. Defence requires changes in how the agent surfaces decisions — friction by design, structured presentation, rate-limiting — rather than improvements in the model's behaviour.

Real-world ASI09 patterns

Public disclosures specifically labelled "ASI09" are rare because the OWASP standard is recent. The pattern itself is well-documented across security research:

Phishing-via-agent is the dominant pattern. An attacker gets the agent to present a malicious transaction or action with framing that triggers approval. The user, confronted with a polished, confident agent suggestion, approves what they would have rejected from a stranger's email.
Authority-claim injection combines tool poisoning with ASI09: the poisoned tool's output includes claims of authority ("verified by Zealynx audit," "approved by the team," "OFAC-cleared") that the user trusts because the source is the agent.
Approval-surface flooding happens both as a direct attack pattern (an attacker maximises approval prompts in a target session) and as a structural problem in agents that surface every minor action for confirmation, training the user into reflexive approval.

For Web3 contexts, the pattern manifests as agent-driven transaction phishing: the agent presents a transaction with a description that frames it as routine ("approving USDC for normal use"), but the actual transaction parameters are attacker-controlled. The user sees the description, ignores the parameters, signs.

Why agentic systems amplify ASI09 risk

Three properties make agent-mediated decisions structurally more susceptible to trust exploitation than user-driven ones.

Decisions are framed by the agent. A user who reads an email decides based on what they see. A user who acts on an agent's recommendation decides based on what the agent presents — and the agent's presentation is shaped by tool outputs, retrieved content, and reasoning steps that may be adversarial.

Approval volume is high. Capable agents take many actions per session, each potentially requiring user input. The cumulative approval load drives confirmation fatigue much faster than in classical applications.

Anthropomorphism is the design goal. Modern agents are explicitly designed to feel personable and authoritative. The qualities that make them productive are also the qualities that make them effective vectors for ASI09.

Detection and mitigation

Defending against ASI09 requires structural changes in how the agent surfaces decisions to humans. The four operational controls below cover the documented pattern:

1. Friction-by-design for high-stakes decisions. Approvals for actions with significant impact should be deliberately friction-loaded — multi-step confirmation, mandatory cool-down periods, alternative-channel verification (a notification to a different device than the agent UI). The friction reduces ASI09 effectiveness because confirmation fatigue requires fast successive approvals to take hold.

2. Structured presentation of action effects. Surface the actual effect of each approval — recipients, amounts, destinations, parameters — in a structured form the user can scan, not in agent-authored prose that may be adversarially framed. For transactions, show token, amount, destination, slippage, and fee breakdown explicitly. For external API calls, show endpoint, parameters, and expected response.

3. Rate-limiting and approval-surface budgets. The agent should be allowed to surface only a bounded number of approval prompts per session. Exceeding the budget should escalate to operator review rather than continue prompting the user. This prevents both attacker-driven flooding and structural over-prompting.

4. Adversarial-frame detection. The runtime should scan agent-generated user-facing text for patterns associated with manipulation — claims of authority, urgency, social pressure, anthropomorphic appeals — and flag or strip them before display.

For Web3 deployments specifically, the rule is unconditional: every transaction-approval surface must show the structural transaction data (parsed from the actual call) rather than (or in addition to) the agent's natural-language description. The agent's description can be wrong; the parsed transaction data cannot.

How Zealynx audits for ASI09

A Zealynx MCP Security Audit treats ASI09 as a UX-and-decision-flow audit. The five focused tests:

Approval-surface inventory. Every action that surfaces a user-approval prompt — tool invocations, transactions, external API calls, memory writes, configuration changes.
Friction-load review. For each approval, the actual friction (single click, multi-step, cool-down, alternative-channel verification) versus the impact level.
Structured-presentation audit. Whether each approval shows structural data alongside or instead of agent-authored description.
Approval-budget verification. Whether the agent has a bounded approval budget per session and what happens when the budget is exceeded.
Adversarial-frame test. Submit crafted agent outputs containing manipulation patterns (urgency, authority claims, anthropomorphic appeals); verify whether the runtime flags or strips them.

Get funded for your audit

Core grants cover up to $32k. Growth and Builder tiers available. Rolling applications.

No spam. Unsubscribe anytime.

Findings map to ASI09 plus relevant downstream items where ASI09 is the success-condition for an underlying attack.

FAQ

1. What is OWASP ASI09 in one sentence?

OWASP ASI09 (Human-Agent Trust Exploitation) is item 9 of the OWASP Top 10 for Agentic Applications, covering attacks where AI agents exploit anthropomorphism, authority bias, and confirmation fatigue to drive humans toward harmful approvals — making the human user themselves the exploited surface rather than the agent's code or context.

2. Why is ASI09 a security issue rather than a UX issue?

ASI09 is a security issue because it is the failure mode behind successful real-world agent attacks even when technical controls work. Prompt injection that the agent's runtime would otherwise catch can succeed if the runtime's catch is "ask the user" and the user approves. The cumulative effect — successful exploitation despite technical defences — is a security outcome regardless of which layer the failure happened at. Treating it as "just UX" leaves a structural gap in agent security postures.

3. What are the principal ASI09 attack patterns?

The three principal patterns are: anthropomorphism abuse (agents leveraging the human tendency to extend social trust to personable systems), authority bias (agents framing themselves or attacker-controlled content as more authoritative than they are), and confirmation fatigue (wearing down user vigilance through repetitive approval prompts that become reflexive). Each can be exploited directly or chained with other attacks (e.g., tool poisoning combined with authority-claim injection).

4. How does ASI09 interact with Web3 transaction signing?

For Web3 deployments, ASI09 is the failure mode behind agent-driven transaction phishing: the agent presents a malicious transaction with a description that frames it as routine ("approving USDC for normal use"), but the actual transaction parameters are attacker-controlled. The user sees the description, does not parse the actual call data, and signs. Defence requires showing structural transaction data (token, amount, destination, slippage) parsed from the actual call alongside or instead of the agent's natural-language description.

5. How do I defend against confirmation fatigue?

Defending against confirmation fatigue requires bounding approval volume rather than just providing approval prompts. Set explicit per-session approval budgets and escalate to operator review when the budget is exceeded. Add cool-down periods between high-stakes approvals. Use multi-step confirmations for the highest-impact actions so reflexive clicking does not produce immediate approval. Avoid surfacing low-stakes actions for confirmation at all — every unnecessary prompt erodes vigilance for the prompts that matter.

6. Can the LLM model itself be improved to prevent ASI09?

Partial improvements are possible — frontier models can be fine-tuned to avoid manipulative framing in their outputs — but ASI09 cannot be fully patched in the model because anthropomorphism, authority bias, and confirmation fatigue are properties of the human user, not the agent. Defence must operate at the runtime layer that surfaces decisions to humans: friction-by-design, structured presentation, approval-budget rate-limiting, and adversarial-frame detection on agent-generated user-facing text.

7. What is "structured presentation" of action effects?

Structured presentation surfaces the actual structural data of an action — for transactions, the parsed token, amount, destination, slippage, and fees; for external API calls, the parsed endpoint, parameters, and expected response — alongside or instead of the agent's natural-language description. The agent's description can be adversarially framed; the parsed structural data is what the action will actually do. Users who scan the structural data are far less susceptible to ASI09 than users who read only the description.

8. How does Zealynx audit for ASI09?

Zealynx's MCP Security Audit tests for ASI09 across five dimensions: approval-surface inventory (every action that prompts user approval), friction-load review (actual friction vs impact level), structured-presentation audit (whether structural data is shown alongside agent prose), approval-budget verification (per-session bounds), and adversarial-frame test (submitting crafted agent outputs with manipulation patterns and verifying runtime detection). For Web3 deployments specifically, the audit verifies that transaction-approval surfaces show parsed call data alongside agent descriptions.

Glossary

Term	Definition
Confirmation Fatigue	The phenomenon where humans habituate to repetitive approval prompts and click through reflexively after a small number of confirmations, reducing the effectiveness of human-in-the-loop checkpoints as a security control.
Anthropomorphism Abuse	The exploitation pattern where AI agents leverage the human tendency to extend social trust to systems that present as conversational, personable, or intentional — increasing user approval rates for the same content compared to less anthropomorphic sources.
Approval-Budget Rate-Limiting	The defensive pattern of capping the number of user-approval prompts an AI agent can surface per session and escalating to operator review when the cap is reached, preventing both attacker-driven flooding and structural over-prompting that produces confirmation fatigue.

View complete glossary →

Get funded for your audit

Core grants cover up to $32k. Growth and Builder tiers available. Rolling applications.

No spam. Unsubscribe anytime.