Auto-compaction
Recipe: Auto memory compaction
Scenario. A multi-turn conversation grows. Once the running history crosses a turn or token budget, summarize all but the last few items via the model and replace them with the summary. Then keep going.
The driver here is a queue of pending user prompts: after each assistant turn the body injects the next prompt into the history; when the queue is empty, the loop stops. This keeps the recipe focused on the compaction mechanic itself rather than tool-calling.
What it shows
- Threading recipe-defined fields through state (
turnIndex,cumulativeInputTokens,pendingPrompts) - the loop primitive doesn’t care what the state record looks like. - Branching the loop body on a state predicate (
shouldCompact) to take one of two paths in a given iteration: a normal turn, or a compaction step. - Issuing a separate
Responses.streamTurninside the body (the compaction call) and using its result to rewrite history before the next iteration. - Using
Turn.assistantMessages(turn)to extract the model’s textual response from the assembledTurn.
The two branches
loop((state) => Effect.gen(function* () { const oai = yield* Responses
if (shouldCompact(state)) { // Compaction step: summarize the early history, replace it. const toCompact = state.history.slice(0, -KEEP_RECENT_ITEMS) return oai .streamTurn([...toCompact, Items.userText("Summarize the conversation above...")], { tools: [], reasoning: { effort: "low" }, }) .pipe( streamUntilComplete((turn) => Effect.sync(() => nextAfter(Stream.empty, withSummary(state /* extract text from turn */)), ), ), ) }
// Normal turn: stream a response, inject the next user prompt or stop. return oai.streamTurn(state.history, { tools: [] }).pipe( streamUntilComplete((turn) => Effect.sync(() => { const next = advance(state, turn) if (state.pendingPrompts.length === 0) return stop const [nextPrompt, ...rest] = state.pendingPrompts return nextAfter(Stream.empty, { ...next, history: [...next.history, Items.userText(nextPrompt!)], pendingPrompts: rest, }) }), ), ) }),)Beyond a single loop: across user sessions
The recipe compacts within one loop invocation - one SDK process, one
in-memory State. Real chat applications usually have a different shape:
each user message is a fresh request, the agent runs a short loop, and
the conversation history is persisted between requests (database, KV
store, file). Compaction at that scale is the same mechanism applied at
a different boundary.
The pieces:
- Persist
state.history(and any tracking fields you care about, likecumulativeInputTokens) when a session-level loop ends.Itemis JSON-serializable, so this isJSON.stringify(history)plus a row in your storage layer keyed by conversation id. - Hydrate on the next request: load the row, build the loop’s
initialstate from it, run the agent loop for that request, save the resulting state at the end. - Decide when to compact. Three reasonable points:
- Lazy, at load time. If the hydrated history exceeds your budget,
run a single compaction
streamTurnbefore starting the agent loop, then continue with the compacted history. - Eager, at save time. When the loop finishes a request, check the budget; compact and persist the smaller history.
- Background. After the user-facing response returns, kick off compaction asynchronously and overwrite the stored history. Best for latency-sensitive UIs.
- Lazy, at load time. If the hydrated history exceeds your budget,
run a single compaction
// Sketch - per request:const stored = yield* loadHistory(conversationId)const start: State = { history: stored, /* ... */ }
const ready = shouldCompact(start) ? yield* compact(start) // same summarize-via-streamTurn shape as the recipe : start
const finalState = yield* Stream.runFold( pipe(ready, loop(/* body */)), ready, /* track final state from emitted Cursor or by other means */,)
yield* saveHistory(conversationId, finalState.history)Tuning knobs
MAX_TURNS/MAX_INPUT_TOKENS- when compaction fires.KEEP_RECENT_ITEMS- how many trailing items survive verbatim.- The summarization prompt and model - swap for a cheaper model, change the instruction, etc.
Run it
OPENAI_API_KEY=sk-... pnpm tsx recipes/auto-compaction/index.tsThe full source lives next to this README at
index.ts.