Run, fix, repeat
LLMs are bad at exact computation. Arithmetic past a few digits, hashing, parsing, token counting, parsing a CSV correctly — they fudge or hallucinate. Worse, you usually only notice once the wrong answer is already in your user’s hands.
Give them Python.
This recipe wires a run_python tool to a sandboxed microVM. The model
writes a program; the sandbox runs it; if it crashes, the model sees the
traceback in its next context window and tries again with corrected code.
By the time the model replies to the user, the answer was actually
computed — not guessed.
One real trace. “What is the 1000th prime number?” (no human in the loop)
- Model writes
import sympy; print(sympy.prime(1000))→ModuleNotFoundError(the slim image plusNetwork.blockedmeans it can’tpip install). - It reads the traceback, rewrites in pure stdlib →
7919. - Replies “The 1000th prime number is 7919.”
The whole point of the recipe is that step 2 happens automatically. The sandbox isn’t an executor; it’s the feedback signal the loop uses to decide what to do next.
What it shows
- A standard agentic loop (
Loop.loop+onTurnComplete) where each tool call is a sandboxexec. Same shape as basic-usage — the only new thing is what the tool does. - One sandbox created at the top of the program and reused across every tool call. Boot cost is paid once; subsequent execs are tens of milliseconds.
Network.blockedon create — the model can’tpip installits way out of the problem. It has to write Python that works with the stdlib.- Scope-bound destruction — when the program ends, the sandbox is gone.
The loop, in shape
export const conversation = (service: LanguageModelService, model: string, sb: SandboxInstance) => { const toolkit = Toolkit.make([makeRunPython(sb)]) const tools = Toolkit.toDescriptors(toolkit)
// After each turn: tool calls → run them and continue; no tool calls → stop. const nextStep = (state: State, turn: Turn.Turn) => Arr.match(Turn.functionCalls(turn), { onEmpty: () => stop, onNonEmpty: (calls) => Toolkit.executeAll(toolkit.tools, calls).pipe( Toolkit.continueWith((results) => Turn.appendTurn( { ...state, index: state.index + 1 }, turn, results.map(toFunctionCallOutput), ), ), ), })
return pipe( initial, loop((state) => Effect.succeed( service .streamTurn({ history: state.history, model, tools }) .pipe(onTurnComplete((turn) => Effect.sync(() => nextStep(state, turn)))), ), ), )}This is the exact same harness as basic-usage.
The only difference is what makeRunPython does — every iteration the
model either calls the tool (sandbox runs Python, output appended to
history, loop continues) or doesn’t (the model produced its final
answer, loop stops).
The “self-correction” isn’t anywhere in this code. It falls out for
free: the tool output is in history, the next streamTurn sees it, and
the model decides whether it’s done or wants to try again.
The tool, in shape
const makeRunPython = (sb: SandboxInstance) => Tool.make({ name: "run_python", description: "Run a Python program inside a sandboxed microVM. Returns exit code, stdout, and stderr.", inputSchema: Tool.fromEffectSchema(Schema.Struct({ code: Schema.String })), run: ({ code }) => sb.exec({ cmd: ["python3", "-c", code] }).pipe( Effect.map((r) => ({ exitCode: r.exitCode, stdout: r.stdout.trim(), stderr: r.stderr.trim(), durationMs: r.durationMs, })), ), strict: true, })The closure captures sb so every invocation hits the same sandbox.
The returned record is what the model sees on its next turn — the
non-zero exitCode and the stderr traceback are exactly the feedback
that makes self-correction possible.
Run it
You’ll need the msb daemon running (npx microsandbox install once,
then msb server start) on Linux/KVM or macOS/Apple Silicon, plus an
API key for one of ANTHROPIC_API_KEY (default), OPENAI_API_KEY, or
GOOGLE_API_KEY.
# install oncepnpm -C recipes-extras/sandbox-code-interpreter install --ignore-workspace
# run (pass --provider openai|google to switch)ANTHROPIC_API_KEY=sk-... \ ./recipes-extras/sandbox-code-interpreter/node_modules/.bin/tsx \ recipes-extras/sandbox-code-interpreter/run.tsThe unusual --ignore-workspace flag and the direct tsx invocation
are explained in recipes-extras/README.md.
Short version: this recipe lives outside the pnpm workspace so its
heavy native deps stay out of the monorepo’s root node_modules.