Skip to content

Inworld

Inworld covers the full speech surface — sync + streaming in both directions, both capability markers shipped — and adds router-style passthroughs that proxy to other STT providers (AssemblyAI, Soniox, Groq Whisper) under the same Inworld auth and billing.

Install

Terminal window
pnpm add @effect-uai/core @effect-uai/inworld effect

Layers

LayerRegistersCapability markers
@effect-uai/inworld/InworldTranscriberInworldTranscriber + Transcriber— (sync)
@effect-uai/inworld/InworldRealtimeTranscriberInworldRealtimeTranscriber + TranscriberSttStreaming
@effect-uai/inworld/InworldSynthesizerInworldSynthesizer + SpeechSynthesizer— (sync + chunked NDJSON)
@effect-uai/inworld/InworldRealtimeSynthesizerInworldRealtimeSynthesizer + SpeechSynthesizerTtsIncrementalText

The sync and realtime layers are separate so you can pull only what you need — the realtime paths add WS / JWT plumbing.

import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import * as Socket from "effect/unstable/socket/Socket"
import { layer as realtimeTranscriber } from "@effect-uai/inworld/InworldRealtimeTranscriber"
import { layer as realtimeSynth } from "@effect-uai/inworld/InworldRealtimeSynthesizer"
const inworld = Layer.unwrap(
Effect.gen(function* () {
const apiKey = yield* Config.redacted("INWORLD_API_KEY")
return Layer.mergeAll(realtimeTranscriber({ apiKey }), realtimeSynth({ apiKey }))
}),
)
const mainLayer = inworld.pipe(
Layer.provide(FetchHttpClient.layer),
Layer.provide(Socket.layerWebSocketConstructorGlobal),
)

Models

STT

ModelNativeStreaming WS
inworld/inworld-stt-1First-party (experimental)
assemblyai/universal-streaming-englishAssemblyAI passthrough
assemblyai/universal-streaming-multilingualAssemblyAI
assemblyai/u3-rt-proAssemblyAI
assemblyai/whisper-rtAssemblyAI
soniox/stt-rt-v4Soniox
groq/whisper-large-v3Groq— (sync only)

Passthrough models are billed against your Inworld key — no separate contracts. Sync STT works for all; streaming WS is supported by everything except groq/whisper-large-v3.

TTS

ModelLatency (P50)LanguagesNotes
inworld-tts-2~200 ms100+Flagship; honors deliveryMode
inworld-tts-1.5-max~200 ms15
inworld-tts-1.5-mini~120 ms15Lowest latency

Voice IDs are human-readable names (“Sarah”, “Edward”, …) but Inworld doesn’t publish a list-voices REST endpoint — browse via the Inworld Portal. InworldVoiceId is typed as plain string.

Request shape

// TTS sync + streaming
type InworldSynthesizeRequest = {
readonly model: InworldTtsModel
readonly voiceId: InworldVoiceId
readonly text: string // omitted on streamSynthesisFrom
readonly outputFormat?: AudioFormat
readonly speed?: number
readonly temperature?: number // (0, 2]
readonly deliveryMode?: "STABLE" | "BALANCED" | "CREATIVE" // tts-2 only
readonly applyTextNormalization?: "ON" | "OFF" // default "ON"
}

deliveryMode is the style-steering knob on inworld-tts-2: STABLE for consistent reading-voice output, CREATIVE for more expressive prosody. Older models ignore it silently.

applyTextNormalization: "OFF" skips the server-side text rewriter (expanding numbers, abbreviations, etc.) — faster, but punctuation pacing is on you.

Wire / auth notes

  • Sync endpoints: REST with bearer auth via Authorization header. Sync TTS comes back as either a single base64 audio blob (/tts/v1/voice) or NDJSON one chunk per line (/tts/v1/voice:stream).
  • Realtime endpoints: short-lived JWT minted from the API key (wsAuth.ts), passed as a bearer on the WS upgrade. Realtime STT expects PCM s16le at 16 kHz mono.

Audio encoding options for TTS (audioConfig.audioEncoding): LINEAR16, MP3, OGG_OPUS, ALAW, MULAW, FLAC, PCM, WAV. Caveat: sync LINEAR16 / WAV responses include a WAV header; streaming chunks don’t. The codec layer surfaces this via AudioFormat.container ("wav" vs "raw").

Errors

Standard HTTP → AiError mapping. WS-side failures surface on the stream’s error channel; non-fatal mid-stream errors come through as TranscriptEvents with _tag: "error".

See also