Skip to content

Migrating to 0.7

0.7 is a capability-honesty pass across every audio and embedding surface. The unifying rule: stop lying. Where a provider cannot honor a request, the call now fails with AiError.Unsupported (load-bearing gaps) or emits a structured warnDropped (best-effort hints), instead of silently substituting a different result. Alongside that, Duration replaces raw durationSeconds everywhere audio carries a length, the MusicGenerator surface is reshaped, and an ElevenLabs music provider lands.

Most of it is mechanical (find-and-replace renames plus a Duration.seconds(n) wrap). The parts that need judgement are the removed GeminiTranscriber (use OpenAI / ElevenLabs / Inworld instead) and the requests that now error where they previously degraded silently.

The additive parts (new ElevenLabs music adapter, the Capabilities warn-and-drop helper, ElevenLabs pronunciation dictionaries, the new recipe runner) need no migration. See New capabilities at the end.

At a glance

AreaWhat changedMechanical?
AudioAudioBlob.durationSeconds: number becomes duration: Duration.Duration (flows through STT TranscriptResult, TTS, music)find-and-replace + Duration helpers
STTGeminiTranscriber removed; prompt splits into prompt + biasingTerms; TranscriptResult.durationSeconds becomes duration; stream format gap now Unsupporteddrop Gemini STT, rename fields
TTSPhoneticEncoding + CustomPronunciation.encoding removed (IPA-only); pronunciations now fail Unsupported on providers without an IPA path; DialogueTurn trimmed to { voiceId, text }drop encoding, handle Unsupported or switch
EmbeddingsEmbedEncoding trimmed to float32 | int8 | binary (sparse / multivector move to JinaEncoding); mismatched encoding / image / multi-part now Unsupportedmove non-dense calls to the typed JinaEmbedding
Musicprompts to prompt, drop bpm / scale / instrumental, MusicResult composes AudioBlob, generate returns GenerateResult, stream emits MusicStreamEventrenames + .primary / .audio at call sites
LLMGemini toolChoice now mapped (was forced AUTO); Gemini URL images now Unsupported; Lyria clip honestly reports mp3behavior only, no rename

The audio, STT, TTS, and embedding sections come first because they touch the most call sites; the music reshape follows; the LLM behavior fixes (no rewrites) are last.

Speech-to-text (Transcriber)

GeminiTranscriber is removed

@effect-uai/google’s GeminiTranscriber rode on :generateContent (an LLM with a hardcoded “transcribe” prompt), not a real transcription endpoint. It had no native word timestamps, no diarization, no structured transcription semantics. It is deleted entirely.

// Before (0.6) — no longer resolves
import * as GeminiTranscriber from "@effect-uai/google/GeminiTranscriber"

Use an in-tree transcription provider instead: OpenAITranscriber (@effect-uai/openai), ElevenLabsTranscriber (@effect-uai/elevenlabs, diarization on the sync method), or InworldTranscriber (@effect-uai/inworld). Real Google STT (Cloud Speech-to-Text V2 / Chirp 3) is designed but not yet in-tree.

prompt splits into prompt + biasingTerms

The old prompt?: string | { terms: string[] } union conflated two orthogonal mechanisms. They are now separate fields on CommonTranscribeRequest: prompt is a free-form prose context hint (OpenAI honors it), biasingTerms is discrete vocabulary to boost (ElevenLabs keyterms, Inworld prompts). Each maps to a structured wire field or warnDropped; neither is stuffed into the other.

// Before (0.6)
transcribe({ model: "whisper-1", prompt: { terms: ["Anthropic", "Effect"] } })
// After (0.7)
transcribe({ model: "whisper-1", biasingTerms: ["Anthropic", "Effect"] })
// Prose context hints stay on `prompt`:
transcribe({ model: "whisper-1", prompt: "A podcast about functional programming." })

TranscriptResult.durationSeconds becomes duration

// Before (0.6)
const secs = result.durationSeconds // number | undefined
// After (0.7)
import { Duration } from "effect"
const secs = result.duration ? Duration.toSeconds(result.duration) : undefined

Per-word offsets (WordTimestamp.startSeconds / endSeconds) stay raw number — they are positions in the audio, not durations.

Behavior changes (no rename)

  • Stream input-format gaps now fail AiError.Unsupported (a per-Layer capability gap) instead of InvalidRequest. Update catch clauses that match on the error tag.
  • OpenAI: diarization is narrowed off OpenAITranscribeRequest (OpenAI has no diarization). The old proactive per-model wordTimestamps guard is gone: it stays allowed, and a non-whisper-1 model now surfaces the provider’s wire 400 rather than a pre-send Unsupported.
  • prompt / biasingTerms on a provider that lacks the field now emit a warnDropped (OpenAI drops biasingTerms; ElevenLabs / Inworld drop prompt). The call still succeeds.

Speech synthesis (TTS)

Pronunciations are IPA-only and load-bearing

PhoneticEncoding and the CustomPronunciation.encoding field are removed. pronunciation is always IPA; adapters translate to the provider’s wire form (inline /ipa/, SSML <phoneme>, or X-SAMPA) internally.

// Before (0.6)
pronunciations: [{ phrase: "Anthropic", pronunciation: "ænˈθrɒpɪk", encoding: "ipa" }]
// After (0.7)
pronunciations: [{ phrase: "Anthropic", pronunciation: "ænˈθrɒpɪk" }]

Because a mispronounced word is wrong audio, pronunciations are now load-bearing (bucket 1): a provider with no stateless IPA path fails the call with AiError.Unsupported instead of silently dropping them.

  • Inworld supports inline IPA (no change).
  • OpenAI, Gemini, and modern ElevenLabs now fail Unsupported when pronunciations is non-empty. Remove the field, switch to Inworld, or (ElevenLabs) provision a dictionary and pass pronunciationDictionaryLocators (see New capabilities).

DialogueTurn trims to { voiceId, text }

styleDescription and speed were Hume-specific and silently ignored by the only in-tree dialogue endpoint (ElevenLabs /v1/text-to-dialogue). They are removed; they will return on a provider-typed turn extension when a Hume adapter lands.

// Before (0.6)
{ voiceId: "Rachel", text: "Hello.", styleDescription: "warm", speed: 0.9 }
// After (0.7)
{ voiceId: "Rachel", text: "Hello." }

Behavior changes (no rename)

  • AudioBlob.duration (see Audio below) — synthesize results read .duration, not .durationSeconds.
  • OpenAI / Gemini now warnDropped for languageCode (both auto-detect) and Gemini also for speed. The audio still renders.

Embeddings

EmbedEncoding is trimmed to the dense cross-provider set

// Before (0.6)
type EmbedEncoding = "float32" | "int8" | "binary" | "sparse" | "multivector"
// After (0.7)
type EmbedEncoding = "float32" | "int8" | "binary"

sparse and multivector are Jina-only structures, not a cross-provider request set. They move to the provider-typed JinaEncoding. If you passed them to the generic embed / embedMany, reach for the typed JinaEmbedding service:

// Before (0.6)
import { embed } from "@effect-uai/core/EmbeddingModel"
yield * embed({ model: "jina-embeddings-v4", input, encoding: "multivector" })
// After (0.7)
import { JinaEmbedding } from "@effect-uai/jina/JinaEmbedding"
const jina = yield * JinaEmbedding.asEffect()
yield * jina.embed({ model: "jina-embeddings-v4", input, encoding: "multivector" })

Responses are unaffected: EmbedResponse<E> / EmbedManyResponse<E> now range over the wider ResponseEncoding (float32 | int8 | binary | sparse | multivector), so a typed Jina response still carries sparse / multivector.

Mismatched encoding, image, and multi-part now fail honestly

The generic path used to take an unsupported encoding and return a float32 vector mislabeled as the requested type (a silent cast-lie). It now validates via the exported assertEncoding guard and fails Unsupported:

  • OpenAI / Gemini: any non-float32 encoding fails Unsupported (both emit float32 only). Omit encoding or pass "float32".
  • Jina (generic path): scalar int8 fails Unsupported. Jina honors float32 and binary (bit-quantized, packed into bytes), not scalar int8 per dimension.
  • OpenAI image input and Jina multi-part input now fail Unsupported (was InvalidRequest): they are capability gaps, not wire-shape errors. Update error handlers that match on the tag.
  • OpenAI task now warnDropped (no task field on any OpenAI model). Gemini task stays silent (honored on some models, ignored on others; no per-model tables).

Music generation

The trimmed common request

The bpm, scale, and WeightedPrompt[] fields were on the Common request because they were native to Google Lyria RealTime. They aren’t native to any of the other 9 productized providers we surveyed (cross-provider matrix). The Lyria 3 sync adapter was silently splicing them into your prompt text on your behalf — that’s client-side prompt construction and the 0.7 rule is don’t do that. Prompts always come from the caller verbatim.

instrumental was on the Common request but ran into a similar problem: only 4 of 10 providers expose a structured force_instrumental toggle; 3 are hard-forced instrumental anyway (ignoring false would silently lie); 2 are prompt-only; 1 has a separate endpoint. Moved to provider-typed extras (ElevenLabsMusicGenerateRequest.forceInstrumental, etc.).

Request: before / after

// Before (0.6)
import * as MusicGenerator from "@effect-uai/core/MusicGenerator"
const result =
yield *
MusicGenerator.generate({
model: "lyria-3-clip-preview",
prompts: [
{ text: "minimal techno", weight: 1.0 },
{ text: "1980s synthwave", weight: 0.3 },
],
bpm: 124,
scale: "A_MINOR",
instrumental: true,
durationSeconds: 30,
outputFormat: { container: "mp3", encoding: "mp3", sampleRate: 44100, channels: 2 },
})
// After (0.7)
import { Duration } from "effect"
import * as MusicGenerator from "@effect-uai/core/MusicGenerator"
const result =
yield *
MusicGenerator.generate({
model: "lyria-3-clip-preview",
// Single prompt string; if you want weighted blend semantics or
// tempo/key hints, include them in your prompt text. No client-side
// splicing happens on your behalf.
prompt:
"minimal techno blended with 1980s synthwave (lighter touch). " +
"Instrumental, 124 BPM, A minor.",
duration: Duration.seconds(30),
outputFormat: { container: "mp3", encoding: "mp3", sampleRate: 44100, channels: 2 },
})
// result is now GenerateResult, not MusicResult directly:
yield * writeFile("out.mp3", result.primary.audio.bytes)

If you actually need typed weighted-prompt blend or BPM-as-structured-field, reach for the provider-typed service:

// Provider-typed surface; restores Lyria-RealTime knobs
import * as LyriaRealtimeGenerator from "@effect-uai/google/LyriaRealtimeGenerator"
// (Lyria RealTime adapter lands separately; reserved name today.)

AudioBlob switches to Duration

// Before
const blob: AudioBlob = {
format: pcmFormat,
bytes,
durationSeconds: 0.5,
}
// After
import { Duration } from "effect"
const blob: AudioBlob = {
format: pcmFormat,
bytes,
duration: Duration.millis(500),
}

The same rename applies to anywhere your code carries an AudioBlob out of SpeechSynthesizer.synthesize and reads durationSeconds. The old field is gone; reading it is a type error.

MusicResult composes AudioBlob instead of extending it

// Before
const result: MusicResult = {
format: mp3Format,
bytes,
durationSeconds: 30,
lyrics: "[Verse]\n...",
watermark: { kind: "synthid" },
}
result.bytes // ok
result.watermark?.kind // string
// After
const result: MusicResult = {
audio: { format: mp3Format, bytes, duration: Duration.seconds(30) },
lyrics: "[Verse]\n...",
watermark: "synthid", // bare string union; no nested record
}
result.audio.bytes // composition wins
result.watermark // "synthid" | "c2pa" | (string & {})

This is composition over inheritance: audio is its own value, so you pass result.audio straight to anything that takes an AudioBlob, hash it, transcode it, without spreading. Adding fields to AudioBlob in the future can never conflict with fields on MusicResult. Watermark is a bare string-literal union — no provider returns additional metadata about its own watermark, so the nested record was carrying nothing.

generate returns GenerateResult, not MusicResult

// Before
const r: MusicResult = yield * MusicGenerator.generate(req)
writeFile(out, r.bytes)
// After
const r: GenerateResult = yield * MusicGenerator.generate(req)
writeFile(out, r.primary.audio.bytes)
// Or, when the provider returns multiple variants:
for (const v of r.variants) {
writeFile(`out-${v.songId ?? "?"}.mp3`, v.audio.bytes)
}

r.primary === r.variants[0]. Today every in-tree provider (Lyria, ElevenLabs) returns exactly one variant. Mureka and Suno (and any future polling-based provider) return two; variants ensures the second isn’t dropped silently. Use singleVariant(result) in your own adapters to wrap a single MusicResult.

MusicSessionInput drops config

The 0.6 union carried a config variant whose fields (density, brightness, muteBass, muteDrums, onlyBassAndDrums, musicGenerationMode, …) were all Lyria-RealTime-specific. Forcing hypothetical second bidi providers to either honor or silently ignore those fields is the same antipattern this release is trying to exit.

// Before
import { configInput, controlInput, promptsInput } from "@effect-uai/core/Music"
const inputs = Stream.fromIterable([
promptsInput([{ text: "techno" }]),
configInput({ bpm: 124, density: 0.8 }), // Lyria-RT-only
controlInput("reset_context"), // Lyria-RT-only enum
])
// After
import { controlInput, promptsInput } from "@effect-uai/core/Music"
const inputs = Stream.fromIterable([
promptsInput([{ text: "techno" }]),
controlInput("reset"), // cross-protocol vocab
])
// For typed config knobs, switch to the Lyria-typed service input
// (LyriaRealtimeSessionInput extends MusicSessionInput with a `config`
// variant carrying LyriaRealtimeConfig).

controlInput action enum is now "play" | "pause" | "stop" | "reset" (cross-protocol convergent vocabulary, see Web Audio / MIDI sequencer convention). Lyria’s "reset_context" action becomes "reset" on the generic surface.

streamGenerationFrom output: AudioChunkMusicStreamEvent

Server-side warnings (filteredPrompt, warning) used to be logged out-of-band. They’re now in-band events alongside audio chunks, same pattern as TurnEvent on the LLM surface.

// Before
const audio: Stream<AudioChunk> = inputs.pipe(MusicGenerator.streamGenerationFrom(req))
// After
import { isAudioEvent } from "@effect-uai/core/Music"
const events: Stream<MusicStreamEvent> = inputs.pipe(MusicGenerator.streamGenerationFrom(req))
// If you only want chunks, filterMap:
const audio = events.pipe(
Stream.filterMap((e) => (isAudioEvent(e) ? Option.some(e.chunk) : Option.none())),
)

Mock script shape changed

MockMusicGenerator script types follow the new shapes:

// Before
MockMusicGenerator.layer({
results: [{ format, bytes, durationSeconds: 30, watermark: { kind: "synthid" } }],
streamGenerationFromChunks: [[chunk(1), chunk(2)]],
})
// After
import { audioEvent, singleVariant } from "@effect-uai/core/Music"
MockMusicGenerator.layer({
results: [
singleVariant({
audio: { format, bytes, duration: Duration.seconds(30) },
watermark: "synthid",
}),
],
streamGenerationFromEvents: [[audioEvent(chunk(1)), audioEvent(chunk(2))]],
})

Language model and Lyria behavior fixes (no rewrites)

These change observable behavior but require no code changes.

  • Gemini toolChoice is now honored. It was hardcoded to AUTO, ignoring the request. It now maps onto Gemini’s functionCallingConfig: auto to AUTO, required to ANY, none to NONE, and a named { type: "function", name } to ANY plus allowedFunctionNames. Code that set toolChoice and relied on it being ignored will now see it applied.
  • Gemini URL images now fail Unsupported. A url-source image was silently dropped from the request (Gemini needs URL images pre-uploaded via the Files API). It now fails AiError.Unsupported. Pass the image as base64 or raw bytes instead.
  • Lyria clip reports the format it produces. lyria-3-clip-preview is fixed at mp3 and has no output-format wire field. Requesting container: "wav" on the clip model previously failed with a per-model Unsupported; it now returns mp3, and the result’s audio.format honestly reflects mp3 (no mislabeling). lyria-3-pro-preview still honors the requested container.

New capabilities (additive — no migration needed)

@effect-uai/elevenlabs/ElevenLabsMusicGenerator

ElevenLabs Music as a second MusicGenerator provider, plus a typed extras surface for compositionPlan / forceInstrumental / signWithC2pa / respectSectionsDurations, and a createCompositionPlan helper for the free POST /v1/music/plan endpoint. Registers the generic MusicGenerator tag and ships native chunked HTTP streaming via POST /v1/music/stream. Does not register MusicInteractiveSession (no bidi).

import { layer as elevenlabsMusicLayer } from "@effect-uai/elevenlabs/ElevenLabsMusicGenerator"
const app = program.pipe(
Effect.provide(
elevenlabsMusicLayer({ apiKey: Redacted.make(process.env.ELEVENLABS_API_KEY!) }).pipe(
Layer.provide(FetchHttpClient.layer),
),
),
)

@effect-uai/core/Capabilities

Centralized warn-and-drop helper for bucket-2 fields (provider has no structured wire field; output is still valid, just less aligned with the caller’s hint). Use it in adapters instead of sprinkling Effect.logWarning calls with ad-hoc shapes.

import { warnDroppedWhen } from "@effect-uai/core/Capabilities"
yield *
Effect.all(
[
warnDroppedWhen(request.lyrics, {
provider: "lyria",
capability: "lyrics",
field: "lyrics",
reason: "Lyria 3 sync has no lyrics wire field; embed in prompt instead.",
}),
warnDroppedWhen(request.seed, {
provider: "lyria",
capability: "seed",
field: "seed",
reason:
"Lyria 3 (Gemini) does not expose seed; use Lyria 2 (Vertex) for seeded generation.",
}),
],
{ discard: true },
)

Emits a structured CapabilityWarning via Effect.logWarning. The shape is stable; the event type may be promoted to a typed AiError variant in a future release if consumers need to pattern-match on it.

Recipe basic-music-generation now supports both providers

Terminal window
GOOGLE_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=google
ELEVENLABS_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=elevenlabs
ELEVENLABS_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=elevenlabs ./my-prompt.txt

Writes out-google.mp3 / out-elevenlabs.mp3 next to the recipe. The recipe body itself stays provider-agnostic (yields the generic MusicGenerator); the runner picks the Layer based on the flag.