Responses / OpenAI
The Responses provider wraps OpenAI’s /v1/responses SSE endpoint and
maps it onto the core LanguageModelService shape. Reasoning models,
tool calls, and response storage are all first-class via the typed
ResponsesRequestOptions.
Install
pnpm add @effect-uai/core @effect-uai/responses effectWire it up
import { Config, Effect, Layer } from "effect"import { FetchHttpClient } from "effect/unstable/http"import { Responses, layer as responsesLayer } from "@effect-uai/responses"
const provider = Layer.unwrap( Effect.gen(function* () { const apiKey = yield* Config.redacted("OPENAI_API_KEY") return responsesLayer({ apiKey }) }),)
const mainLayer = provider.pipe(Layer.provide(FetchHttpClient.layer))responsesLayer registers two service tags from one underlying
implementation:
Responses- the typed tag. Yield this when you want Responses-specific options (reasoning.effort,store,previousResponseId).LanguageModel- the generic tag. Yield this in provider-portable code; onlyCommonRequestOptionsis accepted at the call site.
Config
interface Config { readonly apiKey: Redacted.Redacted readonly baseUrl?: string // defaults to https://api.openai.com/v1}The layer carries connection details only. model is per call (see
below). apiKey is always Redacted.Redacted - never raw string.
Read it with Config.redacted("OPENAI_API_KEY") or wrap manually with
Redacted.make.
baseUrl exists for proxies / Azure / local LLM gateways that speak
the Responses protocol. Most apps leave it unset.
Request shape
interface ResponsesRequest extends Omit<CommonRequest, "model"> { readonly model: OpenAIModel // narrows CommonRequest.model: string readonly reasoning?: { readonly effort: "low" | "medium" | "high" } readonly store?: boolean readonly previousResponseId?: string}On top of the core CommonRequest (history, model, tools,
toolChoice, temperature, maxOutputTokens):
model- typed againstOpenAIModelfor autocomplete at the call site.reasoning.effort- reasoning depth forgpt-5.xmodels. Witheffortset, the model produces reasoning tokens before any output tokens, so streaming text deltas don’t start immediately. Drop it for latency-sensitive flows.store- persist the response on OpenAI’s side so it can be referenced viapreviousResponseIdon a later turn.previousResponseId- resume from a stored response without re-sending the full history. See the pause and resume recipe.
Calling it
import { Effect, Stream } from "effect"import { Responses } from "@effect-uai/responses"
const turn = Effect.gen(function* () { const oai = yield* Responses return oai.streamTurn({ history, model: "gpt-5.4-mini", tools, reasoning: { effort: "low" }, })})streamTurn returns Stream<TurnDelta, AiError>. Pipe it through
Loop.onTurnComplete inside a loop body, or consume the deltas
directly for one-shot calls.
Models
OpenAIModel is a literal union with a (string & {}) tail - you get
autocomplete on known IDs but can pass any string for models the SDK
hasn’t been updated for yet.
Known IDs (as of April 2026): gpt-5.5, gpt-5.5-pro, gpt-5.4,
gpt-5.4-pro, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-5-mini,
gpt-5-nano, gpt-5.3-codex, gpt-4.1, gpt-4.1-mini,
gpt-4o-mini. Reference: OpenAI models.
Errors
HTTP failures map to typed AiError variants:
| Status | Error |
|---|---|
429 | AiError.RateLimited |
408/504 | AiError.Timeout |
401 | AiError.AuthFailed (auth) |
403 | AiError.AuthFailed (permission) |
402 | AiError.AuthFailed (billing) |
413 | AiError.ContextLengthExceeded |
>= 500 | AiError.Unavailable |
| other 4xx | AiError.InvalidRequest |
Recover per-tag with Stream.catchTag("RateLimited", handler). See
multi-model fallback for cross-provider
recovery.