Migrating to 0.7
0.7 is a capability-honesty pass across every audio and embedding
surface. The unifying rule: stop lying. Where a provider cannot honor a
request, the call now fails with AiError.Unsupported (load-bearing
gaps) or emits a structured warnDropped (best-effort hints), instead
of silently substituting a different result. Alongside that, Duration
replaces raw durationSeconds everywhere audio carries a length, the
MusicGenerator surface is reshaped, and an ElevenLabs music provider
lands.
Most of it is mechanical (find-and-replace renames plus a
Duration.seconds(n) wrap). The parts that need judgement are the
removed GeminiTranscriber (use OpenAI / ElevenLabs / Inworld instead)
and the requests that now error where they previously degraded silently.
The additive parts (new ElevenLabs music adapter, the Capabilities
warn-and-drop helper, ElevenLabs pronunciation dictionaries, the new
recipe runner) need no migration. See
New capabilities at
the end.
At a glance
| Area | What changed | Mechanical? |
|---|---|---|
| Audio | AudioBlob.durationSeconds: number becomes duration: Duration.Duration (flows through STT TranscriptResult, TTS, music) | find-and-replace + Duration helpers |
| STT | GeminiTranscriber removed; prompt splits into prompt + biasingTerms; TranscriptResult.durationSeconds becomes duration; stream format gap now Unsupported | drop Gemini STT, rename fields |
| TTS | PhoneticEncoding + CustomPronunciation.encoding removed (IPA-only); pronunciations now fail Unsupported on providers without an IPA path; DialogueTurn trimmed to { voiceId, text } | drop encoding, handle Unsupported or switch |
| Embeddings | EmbedEncoding trimmed to float32 | int8 | binary (sparse / multivector move to JinaEncoding); mismatched encoding / image / multi-part now Unsupported | move non-dense calls to the typed JinaEmbedding |
| Music | prompts to prompt, drop bpm / scale / instrumental, MusicResult composes AudioBlob, generate returns GenerateResult, stream emits MusicStreamEvent | renames + .primary / .audio at call sites |
| LLM | Gemini toolChoice now mapped (was forced AUTO); Gemini URL images now Unsupported; Lyria clip honestly reports mp3 | behavior only, no rename |
The audio, STT, TTS, and embedding sections come first because they touch the most call sites; the music reshape follows; the LLM behavior fixes (no rewrites) are last.
Speech-to-text (Transcriber)
GeminiTranscriber is removed
@effect-uai/google’s GeminiTranscriber rode on :generateContent
(an LLM with a hardcoded “transcribe” prompt), not a real transcription
endpoint. It had no native word timestamps, no diarization, no
structured transcription semantics. It is deleted entirely.
// Before (0.6) — no longer resolvesimport * as GeminiTranscriber from "@effect-uai/google/GeminiTranscriber"Use an in-tree transcription provider instead: OpenAITranscriber
(@effect-uai/openai), ElevenLabsTranscriber (@effect-uai/elevenlabs,
diarization on the sync method), or InworldTranscriber
(@effect-uai/inworld). Real Google STT (Cloud Speech-to-Text V2 /
Chirp 3) is designed but not yet in-tree.
prompt splits into prompt + biasingTerms
The old prompt?: string | { terms: string[] } union conflated two
orthogonal mechanisms. They are now separate fields on
CommonTranscribeRequest: prompt is a free-form prose context hint
(OpenAI honors it), biasingTerms is discrete vocabulary to boost
(ElevenLabs keyterms, Inworld prompts). Each maps to a structured
wire field or warnDropped; neither is stuffed into the other.
// Before (0.6)transcribe({ model: "whisper-1", prompt: { terms: ["Anthropic", "Effect"] } })
// After (0.7)transcribe({ model: "whisper-1", biasingTerms: ["Anthropic", "Effect"] })
// Prose context hints stay on `prompt`:transcribe({ model: "whisper-1", prompt: "A podcast about functional programming." })TranscriptResult.durationSeconds becomes duration
// Before (0.6)const secs = result.durationSeconds // number | undefined
// After (0.7)import { Duration } from "effect"const secs = result.duration ? Duration.toSeconds(result.duration) : undefinedPer-word offsets (WordTimestamp.startSeconds / endSeconds) stay raw
number — they are positions in the audio, not durations.
Behavior changes (no rename)
- Stream input-format gaps now fail
AiError.Unsupported(a per-Layer capability gap) instead ofInvalidRequest. Updatecatchclauses that match on the error tag. - OpenAI:
diarizationis narrowed offOpenAITranscribeRequest(OpenAI has no diarization). The old proactive per-modelwordTimestampsguard is gone: it stays allowed, and a non-whisper-1model now surfaces the provider’s wire 400 rather than a pre-sendUnsupported. prompt/biasingTermson a provider that lacks the field now emit awarnDropped(OpenAI dropsbiasingTerms; ElevenLabs / Inworld dropprompt). The call still succeeds.
Speech synthesis (TTS)
Pronunciations are IPA-only and load-bearing
PhoneticEncoding and the CustomPronunciation.encoding field are
removed. pronunciation is always IPA; adapters translate to the
provider’s wire form (inline /ipa/, SSML <phoneme>, or X-SAMPA)
internally.
// Before (0.6)pronunciations: [{ phrase: "Anthropic", pronunciation: "ænˈθrɒpɪk", encoding: "ipa" }]
// After (0.7)pronunciations: [{ phrase: "Anthropic", pronunciation: "ænˈθrɒpɪk" }]Because a mispronounced word is wrong audio, pronunciations are now
load-bearing (bucket 1): a provider with no stateless IPA path fails the
call with AiError.Unsupported instead of silently dropping them.
- Inworld supports inline IPA (no change).
- OpenAI, Gemini, and modern ElevenLabs now fail
Unsupportedwhenpronunciationsis non-empty. Remove the field, switch to Inworld, or (ElevenLabs) provision a dictionary and passpronunciationDictionaryLocators(see New capabilities).
DialogueTurn trims to { voiceId, text }
styleDescription and speed were Hume-specific and silently ignored
by the only in-tree dialogue endpoint (ElevenLabs /v1/text-to-dialogue).
They are removed; they will return on a provider-typed turn extension
when a Hume adapter lands.
// Before (0.6){ voiceId: "Rachel", text: "Hello.", styleDescription: "warm", speed: 0.9 }
// After (0.7){ voiceId: "Rachel", text: "Hello." }Behavior changes (no rename)
AudioBlob.duration(see Audio below) —synthesizeresults read.duration, not.durationSeconds.- OpenAI / Gemini now
warnDroppedforlanguageCode(both auto-detect) and Gemini also forspeed. The audio still renders.
Embeddings
EmbedEncoding is trimmed to the dense cross-provider set
// Before (0.6)type EmbedEncoding = "float32" | "int8" | "binary" | "sparse" | "multivector"
// After (0.7)type EmbedEncoding = "float32" | "int8" | "binary"sparse and multivector are Jina-only structures, not a cross-provider
request set. They move to the provider-typed JinaEncoding. If you passed
them to the generic embed / embedMany, reach for the typed
JinaEmbedding service:
// Before (0.6)import { embed } from "@effect-uai/core/EmbeddingModel"yield * embed({ model: "jina-embeddings-v4", input, encoding: "multivector" })
// After (0.7)import { JinaEmbedding } from "@effect-uai/jina/JinaEmbedding"const jina = yield * JinaEmbedding.asEffect()yield * jina.embed({ model: "jina-embeddings-v4", input, encoding: "multivector" })Responses are unaffected: EmbedResponse<E> / EmbedManyResponse<E> now
range over the wider ResponseEncoding (float32 | int8 | binary | sparse | multivector),
so a typed Jina response still carries sparse / multivector.
Mismatched encoding, image, and multi-part now fail honestly
The generic path used to take an unsupported encoding and return a
float32 vector mislabeled as the requested type (a silent cast-lie). It
now validates via the exported assertEncoding guard and fails
Unsupported:
- OpenAI / Gemini: any non-
float32encoding failsUnsupported(both emit float32 only). Omitencodingor pass"float32". - Jina (generic path): scalar
int8failsUnsupported. Jina honorsfloat32andbinary(bit-quantized, packed into bytes), not scalar int8 per dimension. - OpenAI image input and Jina multi-part input now fail
Unsupported(wasInvalidRequest): they are capability gaps, not wire-shape errors. Update error handlers that match on the tag. - OpenAI
tasknowwarnDropped(no task field on any OpenAI model). Geminitaskstays silent (honored on some models, ignored on others; no per-model tables).
Music generation
The trimmed common request
The bpm, scale, and WeightedPrompt[] fields were on the Common
request because they were native to Google Lyria RealTime. They aren’t
native to any of the other 9 productized providers we surveyed
(cross-provider matrix).
The Lyria 3 sync adapter was silently splicing them into your prompt
text on your behalf — that’s client-side prompt construction and the
0.7 rule is don’t do that. Prompts always come from the caller
verbatim.
instrumental was on the Common request but ran into a similar
problem: only 4 of 10 providers expose a structured force_instrumental
toggle; 3 are hard-forced instrumental anyway (ignoring false would
silently lie); 2 are prompt-only; 1 has a separate endpoint. Moved to
provider-typed extras (ElevenLabsMusicGenerateRequest.forceInstrumental,
etc.).
Request: before / after
// Before (0.6)import * as MusicGenerator from "@effect-uai/core/MusicGenerator"
const result = yield * MusicGenerator.generate({ model: "lyria-3-clip-preview", prompts: [ { text: "minimal techno", weight: 1.0 }, { text: "1980s synthwave", weight: 0.3 }, ], bpm: 124, scale: "A_MINOR", instrumental: true, durationSeconds: 30, outputFormat: { container: "mp3", encoding: "mp3", sampleRate: 44100, channels: 2 }, })// After (0.7)import { Duration } from "effect"import * as MusicGenerator from "@effect-uai/core/MusicGenerator"
const result = yield * MusicGenerator.generate({ model: "lyria-3-clip-preview", // Single prompt string; if you want weighted blend semantics or // tempo/key hints, include them in your prompt text. No client-side // splicing happens on your behalf. prompt: "minimal techno blended with 1980s synthwave (lighter touch). " + "Instrumental, 124 BPM, A minor.", duration: Duration.seconds(30), outputFormat: { container: "mp3", encoding: "mp3", sampleRate: 44100, channels: 2 }, })
// result is now GenerateResult, not MusicResult directly:yield * writeFile("out.mp3", result.primary.audio.bytes)If you actually need typed weighted-prompt blend or BPM-as-structured-field, reach for the provider-typed service:
// Provider-typed surface; restores Lyria-RealTime knobsimport * as LyriaRealtimeGenerator from "@effect-uai/google/LyriaRealtimeGenerator"// (Lyria RealTime adapter lands separately; reserved name today.)AudioBlob switches to Duration
// Beforeconst blob: AudioBlob = { format: pcmFormat, bytes, durationSeconds: 0.5,}
// Afterimport { Duration } from "effect"
const blob: AudioBlob = { format: pcmFormat, bytes, duration: Duration.millis(500),}The same rename applies to anywhere your code carries an AudioBlob
out of SpeechSynthesizer.synthesize and reads durationSeconds. The
old field is gone; reading it is a type error.
MusicResult composes AudioBlob instead of extending it
// Beforeconst result: MusicResult = { format: mp3Format, bytes, durationSeconds: 30, lyrics: "[Verse]\n...", watermark: { kind: "synthid" },}result.bytes // okresult.watermark?.kind // string// Afterconst result: MusicResult = { audio: { format: mp3Format, bytes, duration: Duration.seconds(30) }, lyrics: "[Verse]\n...", watermark: "synthid", // bare string union; no nested record}result.audio.bytes // composition winsresult.watermark // "synthid" | "c2pa" | (string & {})This is composition over inheritance: audio is its own value, so
you pass result.audio straight to anything that takes an
AudioBlob, hash it, transcode it, without spreading. Adding fields
to AudioBlob in the future can never conflict with fields on
MusicResult. Watermark is a bare string-literal union — no
provider returns additional metadata about its own watermark, so the
nested record was carrying nothing.
generate returns GenerateResult, not MusicResult
// Beforeconst r: MusicResult = yield * MusicGenerator.generate(req)writeFile(out, r.bytes)// Afterconst r: GenerateResult = yield * MusicGenerator.generate(req)writeFile(out, r.primary.audio.bytes)
// Or, when the provider returns multiple variants:for (const v of r.variants) { writeFile(`out-${v.songId ?? "?"}.mp3`, v.audio.bytes)}r.primary === r.variants[0]. Today every in-tree provider (Lyria,
ElevenLabs) returns exactly one variant. Mureka and Suno (and any
future polling-based provider) return two; variants ensures the
second isn’t dropped silently. Use singleVariant(result) in your
own adapters to wrap a single MusicResult.
MusicSessionInput drops config
The 0.6 union carried a config variant whose fields (density,
brightness, muteBass, muteDrums, onlyBassAndDrums,
musicGenerationMode, …) were all Lyria-RealTime-specific. Forcing
hypothetical second bidi providers to either honor or silently ignore
those fields is the same antipattern this release is trying to exit.
// Beforeimport { configInput, controlInput, promptsInput } from "@effect-uai/core/Music"
const inputs = Stream.fromIterable([ promptsInput([{ text: "techno" }]), configInput({ bpm: 124, density: 0.8 }), // Lyria-RT-only controlInput("reset_context"), // Lyria-RT-only enum])// Afterimport { controlInput, promptsInput } from "@effect-uai/core/Music"
const inputs = Stream.fromIterable([ promptsInput([{ text: "techno" }]), controlInput("reset"), // cross-protocol vocab])
// For typed config knobs, switch to the Lyria-typed service input// (LyriaRealtimeSessionInput extends MusicSessionInput with a `config`// variant carrying LyriaRealtimeConfig).controlInput action enum is now
"play" | "pause" | "stop" | "reset" (cross-protocol convergent
vocabulary, see Web Audio / MIDI sequencer convention). Lyria’s
"reset_context" action becomes "reset" on the generic surface.
streamGenerationFrom output: AudioChunk → MusicStreamEvent
Server-side warnings (filteredPrompt, warning) used to be logged
out-of-band. They’re now in-band events alongside audio chunks, same
pattern as TurnEvent on the LLM surface.
// Beforeconst audio: Stream<AudioChunk> = inputs.pipe(MusicGenerator.streamGenerationFrom(req))// Afterimport { isAudioEvent } from "@effect-uai/core/Music"
const events: Stream<MusicStreamEvent> = inputs.pipe(MusicGenerator.streamGenerationFrom(req))
// If you only want chunks, filterMap:const audio = events.pipe( Stream.filterMap((e) => (isAudioEvent(e) ? Option.some(e.chunk) : Option.none())),)Mock script shape changed
MockMusicGenerator script types follow the new shapes:
// BeforeMockMusicGenerator.layer({ results: [{ format, bytes, durationSeconds: 30, watermark: { kind: "synthid" } }], streamGenerationFromChunks: [[chunk(1), chunk(2)]],})// Afterimport { audioEvent, singleVariant } from "@effect-uai/core/Music"
MockMusicGenerator.layer({ results: [ singleVariant({ audio: { format, bytes, duration: Duration.seconds(30) }, watermark: "synthid", }), ], streamGenerationFromEvents: [[audioEvent(chunk(1)), audioEvent(chunk(2))]],})Language model and Lyria behavior fixes (no rewrites)
These change observable behavior but require no code changes.
- Gemini
toolChoiceis now honored. It was hardcoded toAUTO, ignoring the request. It now maps onto Gemini’sfunctionCallingConfig:autotoAUTO,requiredtoANY,nonetoNONE, and a named{ type: "function", name }toANYplusallowedFunctionNames. Code that settoolChoiceand relied on it being ignored will now see it applied. - Gemini URL images now fail
Unsupported. Aurl-source image was silently dropped from the request (Gemini needs URL images pre-uploaded via the Files API). It now failsAiError.Unsupported. Pass the image as base64 or raw bytes instead. - Lyria clip reports the format it produces.
lyria-3-clip-previewis fixed at mp3 and has no output-format wire field. Requestingcontainer: "wav"on the clip model previously failed with a per-modelUnsupported; it now returns mp3, and the result’saudio.formathonestly reflects mp3 (no mislabeling).lyria-3-pro-previewstill honors the requested container.
New capabilities (additive — no migration needed)
@effect-uai/elevenlabs/ElevenLabsMusicGenerator
ElevenLabs Music as a second MusicGenerator provider, plus a typed
extras surface for compositionPlan / forceInstrumental /
signWithC2pa / respectSectionsDurations, and a createCompositionPlan
helper for the free POST /v1/music/plan endpoint. Registers the
generic MusicGenerator tag and ships native chunked HTTP streaming
via POST /v1/music/stream. Does not register MusicInteractiveSession
(no bidi).
import { layer as elevenlabsMusicLayer } from "@effect-uai/elevenlabs/ElevenLabsMusicGenerator"
const app = program.pipe( Effect.provide( elevenlabsMusicLayer({ apiKey: Redacted.make(process.env.ELEVENLABS_API_KEY!) }).pipe( Layer.provide(FetchHttpClient.layer), ), ),)@effect-uai/core/Capabilities
Centralized warn-and-drop helper for bucket-2 fields (provider has no
structured wire field; output is still valid, just less aligned with
the caller’s hint). Use it in adapters instead of sprinkling
Effect.logWarning calls with ad-hoc shapes.
import { warnDroppedWhen } from "@effect-uai/core/Capabilities"
yield * Effect.all( [ warnDroppedWhen(request.lyrics, { provider: "lyria", capability: "lyrics", field: "lyrics", reason: "Lyria 3 sync has no lyrics wire field; embed in prompt instead.", }), warnDroppedWhen(request.seed, { provider: "lyria", capability: "seed", field: "seed", reason: "Lyria 3 (Gemini) does not expose seed; use Lyria 2 (Vertex) for seeded generation.", }), ], { discard: true }, )Emits a structured CapabilityWarning via Effect.logWarning. The
shape is stable; the event type may be promoted to a typed AiError
variant in a future release if consumers need to pattern-match on it.
Recipe basic-music-generation now supports both providers
GOOGLE_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=googleELEVENLABS_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=elevenlabsELEVENLABS_API_KEY=... pnpm tsx recipes/basic-music-generation/run-node.ts --provider=elevenlabs ./my-prompt.txtWrites out-google.mp3 / out-elevenlabs.mp3 next to the recipe.
The recipe body itself stays provider-agnostic (yields the generic
MusicGenerator); the runner picks the Layer based on the flag.