Local sandbox and execution layer
for autonomous agents on macOS.

Autonomous coding agents need somewhere to run generated code safely. Whether it's an orchestrator like JetBrains Air or Conductor dispatching tasks, or your own agent harness calling an LLM in a loop, the agent needs to execute commands, read output, and iterate without risking the host machine.

Shuru is a local execution layer for this. It spins up ephemeral Linux VMs using Apple's Virtualization.framework and boots in about a second. Your agent harness calls the SDK from the host to exec commands, write files, stream process output, and watch for changes inside the VM. The agent's state stays on your machine. The sandbox is disposable.

primitives.ts

import { Sandbox } from "@superhq/shuru"

const sb = await Sandbox.start()

// exec runs a command and returns when it exits
const { stdout, exitCode } = await sb.exec("cat /etc/os-release")

// write files into the sandbox, read them back
await sb.writeFile("/app/server.sh", "while read line; do echo \"echo: $line\"; done")
const content = await sb.readFile("/app/server.sh")

// spawn a long-running process, stream stdout in real-time
const proc = await sb.spawn("sh /app/server.sh")
proc.on("stdout", (data) => process.stdout.write(data))
proc.on("stderr", (data) => process.stderr.write(data))

// write to stdin
proc.write("hello\n")

// watch for file changes inside the VM
await sb.watch("/app", (ev) => {
  console.log(ev.event, ev.path) // "modify", "/app/server.sh"
})

await proc.kill()
await sb.stop()

There are two ways agents use sandboxes. The first is running the agent itself inside the VM, like launching Claude Code in a container. The second is using the sandbox as a tool: the agent runs on the host, generates code or commands, sends them to the sandbox for execution, reads the output, and decides what to do next. This is how most agent harnesses actually work.

With the second pattern, agent state (conversation history, tool results, memory) never enters the VM. If execution fails, the agent can retry or start a fresh sandbox without losing context. Your API keys stay on the host. The sandbox is just the agent's hands.

harness.ts

import Anthropic from "@anthropic-ai/sdk"
import { Sandbox } from "@superhq/shuru"

const client = new Anthropic()
const sb = await Sandbox.start()

const tools = [{
  name: "execute",
  description: "Run a shell command in the sandbox",
  input_schema: {
    type: "object",
    properties: { command: { type: "string" } },
    required: ["command"]
  }
}]

const messages = [
  { role: "user", content: "write a python script that finds the 100th prime" }
]

while (true) {
  const response = await client.messages.create({
    model: "claude-opus-4-6", max_tokens: 4096, tools, messages
  })
  messages.push({ role: "assistant", content: response.content })
  if (response.stop_reason !== "tool_use") break

  // execute each tool call in the sandbox
  const results = []
  for (const block of response.content) {
    if (block.type !== "tool_use") continue
    const { stdout, stderr } = await sb.exec(block.input.command)
    results.push({ type: "tool_result", tool_use_id: block.id, content: stdout + stderr })
  }
  messages.push({ role: "user", content: results })
}

await sb.stop()

Beyond isolation, one of the biggest risks with giving agents API access is credential theft via prompt injection. An agent reads a poisoned file and gets tricked into exfiltrating your keys, and you don't even notice. The usual advice is to limit what the agent can access. The better answer is that the agent never has the real keys in the first place. Sandboxes are offline by default, and when you do enable networking, all traffic goes through a userspace proxy on the host. Secrets stay on the host and get injected at the proxy layer only on HTTPS requests to domains you've explicitly allowed. The guest only ever sees a placeholder token. Even if the agent is fully compromised, there's nothing to steal.

shuru.json

{
  "secrets": {
    "API_KEY": {
      "from": "OPENAI_API_KEY",
      "hosts": ["api.openai.com"]
    }
  },
  "network": {
    "allow": ["api.openai.com"]
  }
}

inside the sandbox

$ echo $API_KEY
shuru_tok_19254a8c7f3e0001

# agent makes a request
# proxy swaps the placeholder for the real key
# but only on allowed hosts

$ curl https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $API_KEY" \
    -d '{"model": "gpt-5", "messages": [...]}'
{"choices": [...]}

# the real key never enters the VM

The part that ties this together for multi-agent workflows is checkpoints. Think of them like git commits for your environment. You set up a base with your dependencies, save it, and any number of agents can fork from that checkpoint independently. Each gets its own isolated copy without rebuilding anything.

Sandboxes start as bare Debian. You install what you need (runtimes, tools, project dependencies) and checkpoint the result. That one-time cost pays for every subsequent fork.

checkpoint.ts

import { Sandbox } from "@superhq/shuru"

// sandboxes start as bare Debian. install what you need, then save
const sb = await Sandbox.start({ allowNet: true })
await sb.exec("apt-get update && apt-get install -y nodejs npm")
await sb.exec("npm install -g @anthropic-ai/claude-code")
await sb.exec("git clone https://github.com/acme/api.git /app")
await sb.exec("cd /app && npm install")
await sb.checkpoint("ready") // saves disk state, stops the VM

Once you have a checkpoint, you can fan out. Each task gets its own VM forked from the saved state. Node, Claude Code, and your project dependencies are already installed. The VM boots in about a second and the agent can start working immediately. Run Claude Code headless inside each sandbox and collect the diffs when they're done.

orchestrator.ts

import { Sandbox } from "@superhq/shuru"

// each issue gets its own VM forked from "ready"
// node, claude code, and deps are already there. boots in ~1s
const issues = [
  { branch: "feat/rate-limit", task: "add rate limiting to POST /api/upload" },
  { branch: "fix/ws-race", task: "fix the race condition in the ws handler" },
  { branch: "feat/api-keys", task: "add API key auth to the middleware" },
]

const opts = {
  from: "ready",
  allowNet: true,
  secrets: {
    ANTHROPIC_API_KEY: {
      from: "ANTHROPIC_API_KEY",
      hosts: ["api.anthropic.com"],
    },
  },
  network: { allow: ["api.anthropic.com"] },
}

const results = await Promise.all(issues.map(async ({ branch, task }) => {
  const sb = await Sandbox.start(opts)
  await sb.exec(`cd /app && git checkout -b ${branch}`)

  // claude code runs the full agent loop: plan, edit, test, iterate
  await sb.exec(["claude", "-p", task, "--allowedTools", "Bash,Read,Edit"])

  const { stdout: diff } = await sb.exec("cd /app && git diff")
  await sb.stop()
  return { branch, diff }
}))

We're working on a KVM backend for Linux so the same SDK works on servers. If you're building an orchestrator or an agent harness that needs a local execution layer, we'd love to hear what's missing.

CLI and TypeScript SDK → GitHub →

article

Local sandbox and execution layer for autonomous agents on macOS.

Local sandbox and execution layer
for autonomous agents on macOS.