Model retry

Retries are not an agent framework feature. They are error policy around one stream.

This recipe shows the retry shape inline in a normal loop body: retry transient provider failures with exponential backoff, while letting semantic or permanent failures cross the boundary immediately.

Scenario. A streamed model turn hits RateLimited, Unavailable, or Timeout. Wait a bit and try again. If the failure is ContentFiltered, AuthFailed, InvalidRequest, or another non-transient error, fail loudly instead of burning retries.

The Design Move

Stream.retry retries any failure it sees. The trick is to make sure it only sees failures you actually want retried.

The stream is temporarily lifted into a small local union:

model events become { _tag: "Event", event };
retryable errors fail as Retryable;
non-retryable errors become { _tag: "Terminal", cause }, a value that escapes the retry layer.

After the retry schedule finishes, the recipe unwraps everything back into the plain Stream<TurnEvent, AiError> the rest of the loop expects.

The retryable set

Three AiError tags get retried. Everything else propagates as-is.

Tag	Retried?	Why
`RateLimited`	✓	transient — provider is asking us to wait
`Unavailable`	✓	transient — transport / 5xx / DNS
`Timeout`	✓	transient — slow request
`ContentFiltered`	✗	the request itself was rejected
`AuthFailed`	✗	wrong key / wrong scope — won’t fix itself
`InvalidRequest`	✗	schema / arg error — won’t fix itself
`ContextLengthExceeded`	✗	needs compaction, not a retry
`Cancelled`	✗	caller asked for it
`IncompleteTurn`	✗	provider broke contract
`GenerationFailed`	✗	mid-generation provider error

The pipeline

streamTurn(req).pipe(
  Stream.map((event): Item => ({ _tag: "Event", event })),
  Stream.catchIf(
    isRetryable,
    (cause) => Stream.fail(new Retryable({ cause })), // retried
    (cause) => Stream.succeed<Item>({ _tag: "Terminal", cause }), // escapes retry
  ),
  Stream.retry(backoff),
  Stream.catchTag("Retryable", (e) => Stream.fail(e.cause)),
  Stream.flatMap((item) =>
    item._tag === "Event" ? Stream.succeed(item.event) : Stream.fail(item.cause),
  ),
)

Reading the pipeline:

Keep normal turn events as values.
Convert only retryable provider errors into the failure type consumed by Stream.retry.
Smuggle non-retryable errors past the retry layer as terminal values.
Restore the original AiError channel before handing the stream downstream.

Downstream still sees a plain model turn stream. Retry is a local policy, not a new abstraction that leaks through the rest of the program.

The schedule

const backoff = Schedule.exponential("200 millis", 2).pipe(
  Schedule.both(Schedule.recurs(3)),
  Schedule.jittered,
)

This means 200ms, 400ms, 800ms, capped at three retries (four total tries), with jitter so many clients do not retry in lockstep.

Tune the constants for your product. If RateLimited and Unavailable should use different policies, split Retryable into separate tagged errors and run them through different retry layers.

Caveat: stream replay

Stream.retry reruns the entire stream. If the provider emitted deltas before the failure, those deltas can be replayed on the next attempt.

For rate limits and transport failures that happen before the first delta, this is exactly what you want. For mid-stream failures where the UI must never see a delta twice, retry at the request boundary instead: use LanguageModel.turn inside Effect.retry, then materialize the completed turn as a synthetic stream. You lose live streaming inside an attempt, but you get at-most-once forwarding.

Run it

OPENAI_API_KEY=sk-... pnpm tsx recipes/model-retry/run.ts

The runner just drives a single conversation against OpenAI; retries will only fire if the API actually returns a retryable failure during the run. The unit tests in index.test.ts cover the retry behavior offline against a flaky in-memory model.

The full source lives next to this README at index.ts.