OpenAI Responses
OpenAI Responses
The Responses provider wraps OpenAI’s /v1/responses SSE endpoint and
maps it onto the core LanguageModelService shape. Reasoning models,
tool calls, and response storage are all first-class via the typed
ResponsesRequestOptions.
Install
pnpm add @betalyra/effect-uai-core @betalyra/effect-uai-responses effectWire it up
import { Config, Effect, Layer } from "effect"import { FetchHttpClient } from "effect/unstable/http"import { Responses, layer as responsesLayer } from "@betalyra/effect-uai-responses"
const provider = Layer.unwrap( Effect.gen(function* () { const apiKey = yield* Config.redacted("OPENAI_API_KEY") return responsesLayer({ apiKey, model: "gpt-5.4-mini" }) }),)
const runtime = provider.pipe(Layer.provide(FetchHttpClient.layer))responsesLayer registers two service tags from one underlying
implementation:
Responses- the typed tag. Yield this when you want Responses-specific options (reasoning.effort,store,previousResponseId).LanguageModel- the generic tag. Yield this in provider-portable code; onlyCommonRequestOptionsis accepted at the call site.
Config
interface Config { readonly apiKey: Redacted.Redacted readonly model: OpenAIModel readonly baseUrl?: string // defaults to https://api.openai.com/v1}apiKey is always Redacted.Redacted - never raw string. Read it
with Config.redacted("OPENAI_API_KEY") or wrap manually with
Redacted.make.
baseUrl exists for proxies / Azure / local LLM gateways that speak
the Responses protocol. Most apps leave it unset.
Request options
interface ResponsesRequestOptions extends CommonRequestOptions { readonly reasoning?: { readonly effort: "low" | "medium" | "high" } readonly store?: boolean readonly previousResponseId?: string}On top of the core CommonRequestOptions (tools, toolChoice,
temperature, maxOutputTokens):
reasoning.effort- reasoning depth forgpt-5.xmodels. Witheffortset, the model produces reasoning tokens before any output tokens, so streaming text deltas don’t start immediately. Drop it for latency-sensitive flows.store- persist the response on OpenAI’s side so it can be referenced viapreviousResponseIdon a later turn.previousResponseId- resume from a stored response without re-sending the full history. See the pause and resume recipe.
Calling it
import { Effect, Stream } from "effect"import { Responses } from "@betalyra/effect-uai-responses"
const turn = Effect.gen(function* () { const oai = yield* Responses return oai.streamTurn(history, { tools, reasoning: { effort: "low" }, })})streamTurn returns Stream<TurnDelta, AiError>. Pipe it through
Loop.streamUntilComplete inside a loop body, or consume the deltas
directly for one-shot calls.
Models
OpenAIModel is a literal union with a (string & {}) tail - you get
autocomplete on known IDs but can pass any string for models the SDK
hasn’t been updated for yet.
Known IDs (as of April 2026): gpt-5.5, gpt-5.5-pro, gpt-5.4,
gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-5-mini,
gpt-5-nano, gpt-5.3-codex, gpt-4.1, gpt-4.1-mini,
gpt-4o-mini. Reference: OpenAI models.
Errors
HTTP failures map to typed AiError variants:
| Status | Error |
|---|---|
429 | AiError.RateLimited |
408/504 | AiError.Timeout |
401 | AiError.AuthFailed (auth) |
403 | AiError.AuthFailed (permission) |
402 | AiError.AuthFailed (billing) |
413 | AiError.ContextLengthExceeded |
>= 500 | AiError.Unavailable |
| other 4xx | AiError.InvalidRequest |
Recover per-tag with Stream.catchTag("RateLimited", handler). See
multi-model fallback for cross-provider
recovery.