Skip to content

Auto-compaction

Recipe: Auto memory compaction

Scenario. A multi-turn conversation grows. Once the running history crosses a turn or token budget, summarize all but the last few items via the model and replace them with the summary. Then keep going.

The driver here is a queue of pending user prompts: after each assistant turn the body injects the next prompt into the history; when the queue is empty, the loop stops. This keeps the recipe focused on the compaction mechanic itself rather than tool-calling.

What it shows

  • Threading recipe-defined fields through state (turnIndex, cumulativeInputTokens, pendingPrompts) - the loop primitive doesn’t care what the state record looks like.
  • Branching the loop body on a state predicate (shouldCompact) to take one of two paths in a given iteration: a normal turn, or a compaction step.
  • Issuing a separate Responses.streamTurn inside the body (the compaction call) and using its result to rewrite history before the next iteration.
  • Using Turn.assistantMessages(turn) to extract the model’s textual response from the assembled Turn.

The two branches

loop((state) =>
Effect.gen(function* () {
const oai = yield* Responses
if (shouldCompact(state)) {
// Compaction step: summarize the early history, replace it.
const toCompact = state.history.slice(0, -KEEP_RECENT_ITEMS)
return oai
.streamTurn([...toCompact, Items.userText("Summarize the conversation above...")], {
tools: [],
reasoning: { effort: "low" },
})
.pipe(
streamUntilComplete((turn) =>
Effect.sync(() =>
nextAfter(Stream.empty, withSummary(state /* extract text from turn */)),
),
),
)
}
// Normal turn: stream a response, inject the next user prompt or stop.
return oai.streamTurn(state.history, { tools: [] }).pipe(
streamUntilComplete((turn) =>
Effect.sync(() => {
const next = advance(state, turn)
if (state.pendingPrompts.length === 0) return stop
const [nextPrompt, ...rest] = state.pendingPrompts
return nextAfter(Stream.empty, {
...next,
history: [...next.history, Items.userText(nextPrompt!)],
pendingPrompts: rest,
})
}),
),
)
}),
)

Beyond a single loop: across user sessions

The recipe compacts within one loop invocation - one SDK process, one in-memory State. Real chat applications usually have a different shape: each user message is a fresh request, the agent runs a short loop, and the conversation history is persisted between requests (database, KV store, file). Compaction at that scale is the same mechanism applied at a different boundary.

The pieces:

  • Persist state.history (and any tracking fields you care about, like cumulativeInputTokens) when a session-level loop ends. Item is JSON-serializable, so this is JSON.stringify(history) plus a row in your storage layer keyed by conversation id.
  • Hydrate on the next request: load the row, build the loop’s initial state from it, run the agent loop for that request, save the resulting state at the end.
  • Decide when to compact. Three reasonable points:
    • Lazy, at load time. If the hydrated history exceeds your budget, run a single compaction streamTurn before starting the agent loop, then continue with the compacted history.
    • Eager, at save time. When the loop finishes a request, check the budget; compact and persist the smaller history.
    • Background. After the user-facing response returns, kick off compaction asynchronously and overwrite the stored history. Best for latency-sensitive UIs.
// Sketch - per request:
const stored = yield* loadHistory(conversationId)
const start: State = { history: stored, /* ... */ }
const ready = shouldCompact(start)
? yield* compact(start) // same summarize-via-streamTurn shape as the recipe
: start
const finalState = yield* Stream.runFold(
pipe(ready, loop(/* body */)),
ready,
/* track final state from emitted Cursor or by other means */,
)
yield* saveHistory(conversationId, finalState.history)

Tuning knobs

  • MAX_TURNS / MAX_INPUT_TOKENS - when compaction fires.
  • KEEP_RECENT_ITEMS - how many trailing items survive verbatim.
  • The summarization prompt and model - swap for a cheaper model, change the instruction, etc.

Run it

Terminal window
OPENAI_API_KEY=sk-... pnpm tsx recipes/auto-compaction/index.ts

The full source lives next to this README at index.ts.