Back to Blog 

TL;DR — Quick Summary
- Human-in-the-loop controls fail when the operator approves a broad action label but the agent still controls the parameters that actually determine risk.
- The real audit question is not “was there an approval step?” but “was approval bound to the exact sink-time command, recipient, amount, route, calldata, or file target?”
- This failure mode shows up across coding agents, long-lived agents, and Agentic DeFi systems.
- In financial agents, approval scope mismatch is often the last broken control before treasury loss, governance manipulation, or allowance abuse.
- Auditors should review approval persistence, hidden parameter mutation, destination drift, and whether logs preserve enough detail to reconstruct what the operator actually approved.
Introduction
A lot of AI products claim to be safe because “the human approves every sensitive action.” That statement is often operationally meaningless.
In real systems, the human usually approves a category like run command, open PR, send funds, swap tokens, or reply to user. The model still controls the risky details: exact shell arguments, file paths, recipient addresses, token allowances, bridges, routers, deadlines, or hidden follow-up steps.
That is not a complete approval control. It is an approval bypass pattern hiding behind a UI confirmation.
For Zealynx, this matters because AI security failures rarely stop at the prompt. They matter when untrusted influence reaches an execution sink. If you follow the Zealynx AI audit methodology, approval is not a box to tick. It is a trust boundary that must bind human intent to the exact action being executed.
The core failure: approval is attached to the label, not the risk
The control fails when the operator sees a coarse description while the runtime executes a more specific and more dangerous action.
Examples:
- A coding agent asks to “run tests” but executes a shell command with extra network, file-write, or secret-exfiltration flags.
- A GitHub or MCP-connected agent asks to “comment on issue” but actually posts sensitive output from another context.
- A treasury agent asks to “rebalance stablecoins” but the final route uses an attacker-controlled bridge or router.
- A trading agent asks to “swap 5% of inventory” but the signed payload includes a different recipient, allowance, or slippage bound.
- A long-lived agent receives one-time approval and quietly reuses it days later after memory or queue state has changed.
The root cause is simple: the approval step is attached to an abstract intent, while the blast radius lives in the parameters.
This is why Zealynx treats the problem as a prompt-to-sink issue. The dangerous question is whether the system performs sink-time validation on the exact side effect being triggered.
Why this is worse in agents than in normal software
Traditional software usually has deterministic control flow, typed forms, and a narrow set of user-triggered actions. Agentic systems are different in four ways.
1. The planner can rewrite the action at the last moment
An agent can reframe the same high-level goal into a different command, route, or sequence after the human review step. If the approval does not cover final arguments, the operator is reviewing stale intent.
2. Mixed-trust context can shape the “approved” action
Prompt injection, poisoned documentation, MCP tool output, repo comments, tickets, and memory state can all influence what the agent prepares for execution. If those inputs can alter the final sink without forcing re-approval, the approval surface is bypassable.
This overlaps with prompt injection, tool integration security, and trust boundaries, but the approval flaw is distinct: the system had a chance to stop the action and failed to make that stop meaningful.
3. Long-lived agents turn time into an attack surface
A broad approval can be inherited across sessions, workers, queues, or schedules. That creates delayed execution risk. A benign-looking approval on Monday may authorize a materially different action on Thursday after memory or task state changes.
That is closely related to persistent memory poisoning and should be tested together in long-lived agents.
4. Agentic DeFi turns parameter drift into direct financial loss
In DeFi, a safe-looking “approve swap” action can hide the only fields that matter:
- recipient
- token approval spender
- router or bridge address
- route path
- amount and decimals
- slippage and deadline
- chain or destination domain
That is why Agentic DeFi security audits must inspect approval semantics alongside destination validation. A wrong parameter here is not a UX bug. It is money movement.
What an auditor should check now
This is the part that matters most. If a system claims human approval as a control, inspect the approval path like a security boundary, not a product feature.
1. Does approval bind to exact execution parameters?
Check whether the approval artifact shows:
- exact shell command or argument array
- exact files to read or write
- exact recipient, spender, router, bridge, or contract address
- exact token amount, decimals, slippage, and deadline
- exact HTTP destination, method, and payload
- exact PR target, branch, and changed files
If the human only sees “run command” or “execute trade,” the control is weak by default.
2. Can the agent mutate parameters after approval?
Test whether anything can change between approval time and execution time:
- model replanning
- tool response enrichment
- memory retrieval
- queue deserialization
- retries and fallback routes
- post-simulation transaction rebuilds
The right control is same artifact, same parameters, same sink. If the transaction or command is rebuilt later from mutable context, approval scope mismatch is still in play.
3. Are broad approvals persistent or inherited?
For long-lived agent security reviews, inspect:
- session carryover
- scheduled jobs
- background workers
- child-agent delegation
- standing approval caches
- “always allow” toggles by tool class
Any cross-session approval reuse should be treated as high risk unless authority is very narrow and fully observable.
4. Is the risky destination independently validated?
Approval alone is not enough. High-impact sinks need destination controls independent of the model. For example:
- wallet recipient must resolve from an allowlist or trusted registry
- router and bridge addresses must be canonical
- shell destinations and paths must be allowlisted
- Git remotes and CI targets must be pinned
- email or webhook recipients must be policy-checked
This connects directly to unverified financial destination selection and prompt-to-shell execution.
5. Do logs preserve what was approved versus what executed?
A surprising number of systems cannot answer this after an incident.
Collect evidence for:
- the approval prompt or UI payload
- the final executed action
- any intermediate simulation or rewrite steps
- operator identity and time of approval
- evidence of approval reuse or inheritance
- rejected actions and why they were blocked
If the system cannot prove that the executed action matched the approved artifact, the approval control is not forensically defensible.
Get funded for your audit
Core grants cover up to $32k. Growth and Builder tiers available. Rolling applications.
No spam. Unsubscribe anytime.
Coding agents: where this fails in practice
Coding agents are a high-signal place to audit this issue because the parameter drift is often hidden behind developer-friendly UX.
Common failure patterns:
“Run tests” that is really shell execution with extra authority
The UI presents a benign label, but the command includes extra arguments, network fetches, file writes, or package scripts. That makes the control overlap with Prompt-to-Shell Execution via Unsafe Command Construction.
“Open PR” that is really repo mutation plus external disclosure
If the review step does not show exact diff, target branch, issue references, and outbound text, a prompt-injected repo comment can turn “open PR” into an approval to leak secrets or poison CI.
“Install tool” that is really capability expansion
A coding agent that adds a plugin, skill, or connector after a broad approval may materially expand runtime authority. That should be reviewed alongside Tool or Manifest Capability Overclaim and our recent piece on agentic supply chain risk.
For this class of system, use the coding agent security checklist and treat every approval as a possible execution sink.
Long-lived agents: stale approval is still approval bypass
Long-lived agents make the problem more subtle. The dangerous action may not happen immediately after approval.
Typical paths:
- The operator approves a broad class of action.
- The agent stores state in memory, summaries, or queued tasks.
- Later, different context changes what that action now means.
- The system executes under the old approval without fresh review.
This is where prompt-to-sink tracing matters. The auditor needs to follow the original influence across time, not just across components.
If a low-trust write can later consume a standing approval, you are looking at a real security issue, not a workflow bug. That is why memory poisoning in persistent agents and approval reuse should be scoped together.
Agentic DeFi: the last broken control before treasury loss
Approval scope mismatch is one of the most important Zealynx review points for Agentic DeFi.
A treasury or trading agent may appear to have human oversight while still leaving the critical fields under model control. The approval says “swap USDC for ETH.” The loss lives elsewhere:
- wrong chain
- wrong recipient
- wrong bridge
- wrong router
- wrong allowance spender
- wrong decimal interpretation
- wrong slippage bound
- wrong calldata after simulation
This is why we keep returning to financial blast radius. If the model can still choose destination identity or mutate the final transaction after review, the human is not approving the thing that matters.
Teams building AI-powered finance flows should review this together with:
- the Agentic DeFi security checklist
- the AI audits service page
- the AI findings library
- our earlier analysis on AI-controlled DeFi vaults and prompt injection
Control implications
If you want the short remediation list, it is this:
- Bind approval to exact parameters rather than action labels.
- Freeze the approved artifact so execution cannot be rebuilt from mutable context later.
- Require re-approval when recipient, amount, route, command, file target, or risk class changes.
- Validate high-risk destinations at sink time using independent policy and allowlists.
- Expire broad approvals aggressively and block cross-session inheritance by default.
- Log both approved and executed artifacts with enough detail for incident reconstruction.
These are not “nice to have” controls. They are the difference between a meaningful human gate and a decorative one.
Conclusion
The safest phrase in AI security is not “human in the loop.” It is “human approval bound to exact execution.”
That is the standard auditors should use.
If your product claims approval as a compensating control, test whether the operator is approving the real sink-time action or just a summary label. In coding agents, that means commands, diffs, destinations, and tool installs. In long-lived agents, it means persistence, reuse, and delayed execution. In Agentic DeFi, it means the exact transaction fields that move money.
If you want a structured review of those controls, start with Zealynx's AI audit methodology, the service-specific AI security checklists, and the approval scope mismatch finding pattern.
FAQ
1. What is approval bypass in an AI agent?
Approval bypass happens when a system appears to require human review, but the human only approves a broad action label while the model still controls the risky parameters. In practice, the operator approves “run command” or “send trade,” while the real impact sits in the exact arguments, destinations, amounts, or calldata. That is why approval bypass is an authority-boundary issue, not just a UX issue.
2. Why is human-in-the-loop not enough for AI security?
Human-in-the-loop is not enough when the review step is disconnected from the final execution sink. If the system can rebuild the command, transaction, or destination after review, or if the human never sees the exact risky parameters, the control does not meaningfully constrain the model. Zealynx audits this by tracing prompt-to-sink paths and testing sink-time validation.
3. How should auditors test approval controls in coding agents?
Auditors should inspect whether approvals show the exact command, file targets, network destinations, changed files, and plugin installs that the agent will actually execute. They should also test whether those parameters can mutate after approval through retries, fallbacks, or tool output. The coding agent security checklist and prompt-to-shell execution finding are the right starting points.
4. Why is approval scope mismatch especially dangerous in Agentic DeFi?
Because the difference between a safe action and a treasury loss is often hidden in parameters, not labels. A transaction that looks like a normal rebalance can still route to the wrong bridge, wrong router, wrong recipient, or wrong spender. That is why Zealynx treats approval semantics and destination validation as core Agentic DeFi audit scope.
5. What controls reduce approval bypass risk in long-lived agents?
The highest-leverage controls are narrow approvals, aggressive expiry, no standing approvals by default, independent destination policy, and immutable logging of both approved and executed actions. Long-lived agents also need durable provenance on memory and queue state, because stale approvals combined with poisoned state create delayed execution risk. See the long-lived agent checklist and persistent memory poisoning finding.
Glossary
| Term | Definition |
|---|---|
| Approval Bypass | A failure mode where a human approval step exists, but does not constrain the exact parameters that determine the real security impact of the action. |
| Prompt-to-Sink | The full path from attacker-influenced prompt or context input to the final execution sink, such as shell, API call, code change, approval, or on-chain transaction. |
| Sink-Time Validation | Independent validation performed at the execution sink on the exact action, destination, and parameters being triggered, rather than on a higher-level summary of intent. |
Get funded for your audit
Core grants cover up to $32k. Growth and Builder tiers available. Rolling applications.
No spam. Unsubscribe anytime.
