Building a Sandboxed Execution System for AI Agents and Untrusted Code

Today

I didn’t really set out to build something like a “sandboxed execution system for AI agents.” It started from something much more ordinary, I just wanted a way to safely run code on a server that I didn’t fully trust.

Sometimes it was code generated by AI Agents. Sometimes it was automation scripts that looked correct at first glance but had no real guarantees behind them. The obvious answer is always the same:

Just run it in a container.

And for a while, that feels like the end of the story. But once you start using it for iterative, stateful workloads, it starts to break down.

So I started building something more opinionated: not just a way to run code, but a controlled execution environment that could manage state, sessions, and lifecycle around untrusted workloads.

That system became Bastion: a self-hosted sandboxed execution environment for running untrusted code from AI agents and automation tools, with persistent sessions instead of one-off containers.

Check out the project on Github!

The problem isn’t really “running code safely”

My first framing of the problem was pretty narrow: isolation. Docker gives you that. Spin up a container, execute code, tear it down. So that should be enough, right? But very quickly, I started running into friction that doesn’t show up in simple setups:

At that point, “just use Docker” starts to feel like it solves only one dimension of the problem. Because the real issue wasn’t execution itself. It was persistent stateful execution, where the environment isn’t disposable anymore, but something you manage and once you think in those terms, the problem stops being simple. You’re no longer just running code. You’re effectively managing a persistent, sandboxed execution environment per session.

The first approaches don’t really scale in your head (or in practice)

The naive version is straightforward: spin up a container per request, run a command (docker run), return stdout, destroy everything.

It works, until you try to do anything iterative. The moment you need state, everything breaks down: dependency installs repeat every time, filesystem state disappears after execution, workflows can’t continue across commands, debugging becomes almost impossible because context resets constantly

So the next step feels obvious:

keep a container alive per session

This immediately feels closer to what you actually want. Now each session is a long-lived environment, and commands execute inside it.

But this shift quietly introduces a new set of problems:

At this point, I stopped thinking in terms of “a runner”. What I was building started to look more like: a system that manages execution environments. That shift changes everything.

The system naturally started organizing itself around sessions

Once you stop thinking in terms of isolated commands and start thinking in terms of environments, a core abstraction emerges pretty quickly:

A session is not a request. It is a long-lived execution environment.

That idea became the center of everything. From there, the system naturally split into layers:

The orchestrator ends up being where the “real system” lives

One thing I didn’t fully appreciate at the beginning is that Docker is not really an orchestration system in the way this problem needs.

It gives you primitives: start container, exec into container, attach streams, kill container

But everything above that: the logic that makes these meaningful is missing. So the orchestrator becomes the actual control center.

It’s responsible for things like:

At some point, it started to feel less like “application logic” and more like a lightweight scheduler, something closer to an OS concept, but at the container level. That analogy actually helped reason about it more clearly.

Sessions became the central abstraction

A session represents a stable environment where multiple executions happen over time.

This changes how execution itself is modeled. It stops being: run → get output → discard and becomes: stateless operations over a persistent stateful environment

That tradeoff is powerful, but it introduces its own complexity: concurrency inside a shared environment, consistency of state over time, isolation between executions that still share a container

Why docker exec started to make more sense than spawning containers

At some point, there are two obvious directions:

  1. spawn a new container per execution
  2. reuse a running container and execute inside it

I tried both directions, and they lead to very different systems. Spawning containers gives clean isolation, but:

Using docker exec inside persistent containers shifts the system in a different direction: state is naturally preserved, execution becomes fast and incremental, interactive workflows become possible

But the tradeoff is real: you now have to manage multiple processes inside one environment, resource isolation becomes harder, orphan processes and cleanup become real problems

Still, for this kind of system, the tradeoff is worth it because the core requirement is not “one-off execution”, it’s continuous interaction with an evolving environment.

Filesystem persistence turned out to be the simplest hard decision

Persistence sounds easy until you realize containers are inherently ephemeral. Bind mounts solve a lot immediately:

But they also shift responsibility upward: isolation is now something you must enforce carefully, path traversal becomes your problem, not Docker’s, boundaries are enforced at the API and orchestration layer

This is a recurring pattern in the system: simplicity in one layer usually moves complexity to another.

The terminal subsystem made the system feel “alive”

At some point, I wanted more than just command execution. I wanted an actual interactive environment, something that behaves like a shell session. So the system evolved into a WebSocket-based terminal layer:

Client ↔ WebSocket ↔ API ↔ docker exec (TTY mode)

This introduces a different class of problems:

Once this works, the system stops feeling like a “command runner”. It starts feeling like a live environment you can interact with directly.

Concurrency is where the system stops being simple

Single execution per session is straightforward. Multiple concurrent executions inside the same environment is where everything becomes more delicate. Now you have to think about:

Even though everything still runs inside one container. This is the point where the system stops feeling like a wrapper around Docker and starts feeling like its own execution model.

Security is not one mechanism, it’s a stack of assumptions

It’s easy to assume containers give you “security”. They don’t, at least not by themselves. So the model becomes layered:

But the important realization is: security here is not a single guarantee, it’s a composition of constraints

And if any layer is misconfigured, the guarantees degrade quickly. So instead of thinking in terms of “secure system”, it becomes: a system designed to contain failure within predictable boundaries

Persistence and reconciliation became unavoidable

One assumption I made early on was:

if a container exists, the system state is consistent

That turns out to be false fairly quickly. Containers crash. Machines restart. State drifts out of sync. So on startup, the system has to reconcile reality:

This becomes a simple but critical recovery loop:

  1. load persisted session state
  2. inspect runtime containers
  3. match and reconcile
  4. repair inconsistencies

It’s not exciting, but it’s the kind of thing that determines whether a system feels reliable or fragile.

Observability ends up mattering more than expected

At some point, you can’t reason about the system without visibility into it. So everything starts emitting structured logs:

Not because it’s “good engineering practice”, but because without it debugging turns into guessing and for a system that executes arbitrary code, guessing is not acceptable.

What this ended up teaching me

A few things became clear only after the system started stabilizing:

Where this naturally leads next

Once a single-machine system like this starts working, the questions change:

But that feels like a different class of system entirely.

Right now, the more interesting part was simply getting a single machine to behave like a stable, stateful execution environment that doesn’t fall apart under real usage and even that turned out to be more subtle than it initially looked.