Skip to content

OpenAI

OpenAI ships sync transcription via REST, realtime transcription via WebSocket, and text-to-speech via chunked HTTP. Each lives at its own subpath so the realtime peer dep doesn’t infect sync-only builds.

Install

Terminal window
pnpm add @effect-uai/core @effect-uai/openai effect

The realtime transcriber additionally needs ws (a peer dep):

Terminal window
pnpm add ws

ws is only pulled in by @effect-uai/openai/OpenAIRealtimeTranscriber. The sync OpenAITranscriber and OpenAISynthesizer paths don’t require it — edge / browser builds stay slim.

Layers

LayerRegistersCapability markers
@effect-uai/openai/OpenAITranscriberOpenAITranscriber + Transcriber
@effect-uai/openai/OpenAIRealtimeTranscriberOpenAIRealtimeTranscriber + TranscriberSttStreaming
@effect-uai/openai/OpenAISynthesizerOpenAISynthesizer + SpeechSynthesizer— (no TtsIncrementalText — OpenAI has no /stream-input endpoint)
import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import { layer as transcriberLayer } from "@effect-uai/openai/OpenAITranscriber"
import { layer as realtimeLayer } from "@effect-uai/openai/OpenAIRealtimeTranscriber"
import { layer as synthLayer } from "@effect-uai/openai/OpenAISynthesizer"
const openai = Layer.unwrap(
Effect.gen(function* () {
const apiKey = yield* Config.redacted("OPENAI_API_KEY")
return Layer.mergeAll(
transcriberLayer({ apiKey }), // sync STT
realtimeLayer({ apiKey }), // streaming STT
synthLayer({ apiKey }), // sync + chunked TTS
)
}),
)
const mainLayer = openai.pipe(Layer.provide(FetchHttpClient.layer))

Models

STT

ModelSyncStreamingNotes
gpt-4o-transcribe✓ (?intent=transcription)Plain text only
gpt-4o-mini-transcribePlain text only, cheaper
whisper-1Only model supporting wordTimestamps: true

wordTimestamps: true requires whisper-1. Passing it with a GPT-4o model fails with AiError.Unsupported. diarization isn’t offered on the OpenAI transcription endpoint at all.

TTS

ModelStreamingNotes
gpt-4o-mini-ttschunked HTTPCurrent steerable model; honors instructions
tts-1 / tts-1-hdchunked HTTPLegacy; ignore instructions silently

Stock voices (no custom-voice path): alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse. ballad, coral, and verse are gpt-4o-mini-tts-only. Because there’s no clone path, OpenAISynthesizeRequest.voiceId narrows to the stock-only literal union — passing an arbitrary string is a type error.

Request shape

// STT sync
type OpenAITranscribeRequest = {
readonly model: OpenAITranscribeModel
readonly audio: AudioSource
readonly language?: string
readonly prompt?: string | { readonly terms: ReadonlyArray<string> }
readonly wordTimestamps?: boolean // whisper-1 only
readonly temperature?: number
readonly fileName?: string // overrides multipart filename
}
// TTS sync + chunked
type OpenAISynthesizeRequest = {
readonly model: OpenAITtsModel
readonly voiceId: OpenAIVoiceId // stock-only literal union
readonly text: string
readonly outputFormat?: AudioFormat
readonly speed?: number
readonly instructions?: string // gpt-4o-mini-tts only
}

instructions is a free-form prompt for tone, emotion, pacing — “sound apologetic,” “read this slowly with emphasis on the second sentence.” Honored only by gpt-4o-mini-tts; silently ignored by the legacy tts-1 family.

Wire / auth notes

Realtime STT uses wss://api.openai.com/v1/realtime?intent=transcription and requires two upgrade headers: Authorization: Bearer … and OpenAI-Beta: realtime=v1. Browser WebSocket can’t set headers, so OpenAIRealtimeTranscriber uses the ws peer dep to construct the socket with those headers — that’s why this transcriber lives at a separate subpath. Use it from Node / Bun; for browser deployments, proxy through a server.

Realtime expects PCM s16le at 24 kHz (not 16 like most other providers). Set inputFormat accordingly on the streaming request, or the upstream rejects the audio.

Output formats for TTS: mp3, opus, aac, flac, wav, pcm. pcm is 24 kHz mono s16le, suitable for direct AudioWorklet playback.

Errors

Standard HTTP → AiError mapping applies:

StatusError
429AiError.RateLimited
408/504AiError.Timeout
401AiError.AuthFailed (auth)
>= 500AiError.Unavailable
other 4xxAiError.InvalidRequest

wordTimestamps: true against a non-whisper-1 model → AiError.Unsupported at request time.

See also

  • Speech overview — generic tags and capability markers.
  • Voice loop — uses ElevenLabs by default; the recipe’s runPipeline typechecks against either provider via the marker contract.
  • Streaming transcription — default provider is OpenAI Realtime.