Music generation
A short text prompt can be enough for a usable clip.
Music generation is the audio sibling of image generation: describe the thing you want, get media bytes back. The current implementation is Google Lyria, but the service is shaped so other providers can hide their own polling or streaming details behind the same calls.
Scenario. You need background music for a video, a quick demo
track, or a musical sketch. Start with a prompt like
"Lo-fi piano with brushed drums, 70 bpm". Add weighted influences,
lyrics, BPM, or key hints only when you need more control.
Three Modes
import { generate, streamGeneration, streamGenerationFrom } from "@effect-uai/core/MusicGenerator"generate— sync. Prompt in, fullMusicResult(audio blob + optionallyrics/sections/watermark) out. Async / poll-based providers (Lyria 3 sync, Suno, Mureka) hide their poll loop inside the adapter — caller still sees a singleEffect.streamGeneration— prompt in,Stream<AudioChunk>out. Providers without a native chunked endpoint emit a single chunk; bidi providers stream natively.streamGenerationFrom— bidirectional. AStream<MusicSessionInput>pushes prompt blends, config deltas, and playback controls into the session; aStream<AudioChunk>streams audio back. Gated by theMusicInteractiveSessioncapability marker.
Start with generate. Reach for the streaming shapes when a provider
can produce audio incrementally or when an interactive session needs to
change prompts while music is playing.
The Shape
interface MusicGeneratorService { readonly generate: (req: CommonGenerateMusicRequest) => Effect<MusicResult, AiError> readonly streamGeneration: (req: CommonStreamGenerateMusicRequest) => Stream<AudioChunk, AiError> readonly streamGenerationFrom: <E, R>( input: Stream<MusicSessionInput, E, R>, req: CommonStreamGenerateMusicRequest, ) => Stream<AudioChunk, AiError | E, R>}Prompt Shape
type CommonGenerateMusicRequest = { readonly model: string readonly prompts: string | ReadonlyArray<WeightedPrompt> readonly lyrics?: string // optionally with [Verse] / [Chorus] tags readonly durationSeconds?: number readonly bpm?: number readonly scale?: string // provider vocabulary, e.g. Lyria "C_MAJOR" readonly instrumental?: boolean readonly outputFormat?: AudioFormat}
type WeightedPrompt = { readonly text: string readonly weight?: number // default 1.0}prompts can be one string or a weighted blend. lyrics can include
section tags such as [Verse] and [Chorus]. bpm, scale, and
instrumental are hints; each provider decides whether those are
structured wire fields or prompt text.
What You Get Back
type MusicResult = AudioBlob & { readonly songId?: string readonly lyrics?: string readonly sections?: ReadonlyArray<{ label: string; startSeconds: number; endSeconds: number }> readonly watermark?: { kind: string }}watermark is always set for Lyria (SynthID). lyrics / sections
only when the model returned them.
Capability Marker
MusicInteractiveSession gates streamGenerationFrom. Today no
provider Layer ships it; Lyria RealTime is the planned implementation.
Calling streamGenerationFrom against @effect-uai/google/LyriaGenerator
is a compile-time error until that lands.
This is the same phantom-marker pattern as
SttStreaming / TtsIncrementalText —
provider capability gaps surface at Effect.provide typechecking, not
runtime.
Provider matrix
| Provider | Sync | Chunked stream | Bidi session |
|---|---|---|---|
| Google Lyria | ✓ (clip + pro) | ✓ (single-chunk emul.) | — (planned: Lyria RealTime) |
The matrix is small today. Suno, Udio, Mureka, MiniMax, and Stable Audio are candidates for follow-up phases; they fit the same service-tag shape.
Next step
Basic music generation — a simple prompt and a weighted-prompt variant with lyrics + BPM hints, both against Lyria 3.
See also
- Google Lyria provider — models, request shape, watermark notes.
- Speech — sibling capability for STT and TTS.