Basic metrics
When you stream a model turn, you usually want to know how it is going while
it goes: how long until the first token, how fast tokens are coming, how many
you spent, how long the whole thing took. effect-uai exposes each of these as
a small stream operator you stack onto a generation. They emit typed
MetricEvents interleaved with the model’s own events, at their own cadence,
and pass everything else through untouched.
Scenario. Ask Gemini Flash for a very long (≈20 page) fantasy story. The story streams to a file; the only thing we log is the metrics.
Attach the meters
The recipe builds one long generation and pipes it through
Metrics.allMetrics, which stacks all four built-in meters. Throughput is
configured in tokens per second using a 4-characters-≈-1-token estimate
(the library ships no tokenizer, and a live rate cannot use the provider’s
authoritative count, which only arrives at the end):
LanguageModel.streamTurn({ model: cfg.model, history: [Items.systemText(SYSTEM_PROMPT), Items.userText(cfg.prompt)], maxOutputTokens: cfg.maxOutputTokens,}).pipe( Metrics.allMetrics({ throughput: { every: "1 second", unit: "token", tokenizer: estimateTokens }, }),)allMetrics is just sugar for stacking the four operators; pick a subset by
piping them yourself (Metrics.timeToFirstToken(), Metrics.tokenTotals,
…). Each meter widens the stream with its own event and leaves the story
deltas alone.
Split text from metrics
The metered stream carries two kinds of element: the model’s TurnEvents
(including TextDelta, the story text) and the MetricEvents. The runner
tells them apart with the structural guard Metrics.isMetricEvent and routes
each: story deltas are written to a scoped file handle as they arrive (so the
20 pages are never buffered in memory), metric samples are logged.
fantasyStory(cfg).pipe( Stream.runForEach((event) => Metrics.isMetricEvent(event) ? logMetric(event) : event._tag === "TextDelta" ? Effect.asVoid(file.write(encoder.encode(event.text))) : Effect.void, ),)A typical run logs something like:
TTFT 420ms (text)throughput ~180 token/sthroughput ~205 token/sthroughput ~198 token/s...tokens in=24 out=13180 total=13204completed 71.4s total, 71.0s generatingTTFT fires the instant the first token lands. throughput ticks once a
second while the model writes. tokens and completed land together when the
turn finishes; tokens are the provider’s authoritative counts, so the final
token figure is exact even though the live throughput was estimated.
Run it
GOOGLE_API_KEY=... pnpm tsx recipes/basic-metrics/run-node.ts
# override the story, model, or output path:PROMPT="a story about a clockwork dragon" OUTPUT_FILE=dragon.txt \ GOOGLE_API_KEY=... pnpm tsx recipes/basic-metrics/run-node.tsThe same app.ts runs under Bun (run-bun.ts) and Deno (run-deno.ts); only
the platform HttpClient + FileSystem differ.
Where to go next
- Export instead of log.
Telemetry.record()records the sameMetricEvents into effectMetricinstruments, andTelemetry.layerOtlpships them to an OTLP backend. The meters do not change; you add a sink. - Custom metrics.
Metrics.makeEventmints your own branded metric event (a tool-latency timer, a cost gauge, …) that the same recorder exports with no changes. - Real tokens. Replace the
estimateTokensheuristic with a tokenizer (for example@huggingface/transformers) for exact live token rates.