Basic transcription
A finished audio file should be easy to treat like text.
This recipe takes a voice note, meeting clip, podcast excerpt, or any other file you already have, sends it to a transcription provider, and returns the transcript. The same program can run on OpenAI or Gemini; the provider choice stays in the runner.
Scenario. You have meeting.mp3 and want the text. If you are on
OpenAI Whisper you can also ask for word timestamps and build a simple
timeline.
The Shape
One call does the work:
import { transcribe } from "@effect-uai/core/Transcriber"
const result = yield * transcribe({ audio: { _tag: "bytes", bytes: audioBytes, mimeType: "audio/mpeg" }, model: "gpt-4o-transcribe", language: "en", })// result.text : string// result.words? : WordTimestamp[] (only with wordTimestamps + whisper-1)The important part is the boundary: audio bytes in, typed transcript
data out. index.ts only depends on the generic Transcriber tag, so
the runner can provide OpenAI or Gemini without changing the recipe
body.
Fast Text Or Timestamps
The recipe includes two paths:
- Fast uses the provider’s normal text-only model. It works on both OpenAI and Gemini.
- Verbose uses OpenAI
whisper-1withwordTimestamps: true, so you getresult.wordsas well asresult.text.
Gemini’s transcription is prompt-driven and text-only, so the runner
skips the timestamp variant when you choose --provider gemini.
| Provider | Fast model | Timestamp path |
|---|---|---|
openai | gpt-4o-transcribe | whisper-1 |
gemini | gemini-2.5-flash | not supported |
Run it
# Default: OpenAIOPENAI_API_KEY=sk-... pnpm tsx recipes/basic-transcription/run-node.ts path/to/audio.wav
# GeminiGOOGLE_API_KEY=... pnpm tsx recipes/basic-transcription/run-node.ts --provider gemini path/to/audio.wavAccepted formats: m4a, mp3, mp4, mpeg, mpga, oga, ogg,
wav, webm, flac. Gemini caps total inline request size at 20 MB.
What This Generalizes To
Use transcribe whenever you have the whole audio asset up front:
uploads, async jobs, podcast processing, meeting notes. For a live mic,
switch to Streaming transcription;
the shape is the same service, but the input is a Stream<Uint8Array>
and the output is a stream of partial and final transcript events.
The full source lives next to this README at
index.ts.