Sandbox Escape (Agentic)

An attack where code or commands intended to run inside a constrained sandbox (container, seccomp profile, restricted directory) reach execution outside the constraint — exfiltrating credentials, modifying host files, or pivoting to privileged subsystems.

A Sandbox Escape (Agentic) is an attack where code or commands intended to run inside a constrained sandbox reach execution outside the constraint — exfiltrating credentials, modifying host files, or pivoting to privileged subsystems the sandbox was supposed to isolate. The pattern is older than agentic AI but takes on new forms in the agentic context, where the sandbox boundary may be modelled implicitly (a "filesystem MCP server only operates in this directory") rather than enforced at the kernel level.

The CVE record from 2025–2026 includes several worked examples. CVE-2025-53109 and CVE-2025-53110 ("EscapeRoute") in the official Anthropic Filesystem MCP server allowed symlink-following and path-prefix bypass to read and write outside the configured root. The sandbox boundary was enforced at the application layer (path-string comparison) rather than at the kernel layer (chroot, mount-namespace isolation, or openat2 with RESOLVE_BENEATH), and the application-layer check was bypassable.

Why Application-Layer Sandboxes Fail

Three properties make application-layer sandboxes unreliable in agentic contexts. The agent's tool inputs are adversarial. Any path string the agent passes to a tool can be crafted by an attacker who controlled the prompt context that produced it. Application-layer path normalisation has decades of bypass research; relying on it inside an agent loop inherits all those bypasses. Symlinks, hardlinks, and filesystem race conditions are not modelled at the application layer. A path that looks safe at validation time can change between validation and use (TOCTOU). The "sandbox" exists only in the operator's mental model. There is no kernel mechanism preventing escape — the constraint is whatever logic the application implements, which is exactly what the attacker is targeting.

Kernel-Level Sandbox Patterns

Effective sandboxing requires kernel-enforced boundaries that the application cannot bypass even by buggy logic. Common patterns: container isolation (Docker, Podman, OCI runtimes) with restricted bind mounts, capabilities dropped, and network namespaces; seccomp profiles restricting which syscalls the sandboxed process can issue; AppArmor/SELinux mandatory access controls limiting which paths and resources the process can reach; openat2 with RESOLVE_BENEATH / RESOLVE_NO_SYMLINKS for filesystem operations that must stay inside a given root; user namespaces isolating UID and GID mappings.

For IDE-embedded agents, the additional consideration is that the sandboxed process should not inherit the user's environment — credentials, session tokens, signing keys — that the agent might otherwise reach. The strictest pattern is to run high-authority exec primitives in dedicated, non-credentialed worker processes rather than inheriting the agent host's identity.

For deeper operational guidance, see the OWASP ASI05 explainer and the MCP Breach Index 2025–2026, which catalogues the disclosed sandbox-escape CVEs in MCP-ecosystem components.

Need expert guidance on Sandbox Escape (Agentic)?

Our team at Zealynx has deep expertise in blockchain security and DeFi protocols. Whether you need an audit or consultation, we're here to help.

Get a Quote