Skip to content

Metrics

While a generation streams, you usually want to know how it’s going: how long until the first token, how fast tokens are coming, how many you spent, how long the whole thing took. Each of those is a small operator you stack onto the stream. They emit typed MetricEvents alongside the model’s own events, at their own cadence, and leave everything else untouched.

Attach the meters

Metrics.allMetrics stacks all four built-ins onto a turn:

import * as Metrics from "@effect-uai/core/Metrics"
const metered = LanguageModel.streamTurn(request).pipe(Metrics.allMetrics())

Now your stream carries two kinds of element: the model’s TurnEvents (the text deltas) and the MetricEvents. Tell them apart with isMetricEvent and do what you like with each - here, log the metrics and keep the text:

metered.pipe(
Stream.runForEach((event) => (Metrics.isMetricEvent(event) ? Console.log(event) : Effect.void)),
)

The four samples and the fields you read off them:

OperatorFiresRead
timeToFirstTokenon the first tokenelapsed, kind
throughputevery intervalratePerSecond, unit, window
tokenTotalswhen the turn finishesusage, cumulative
timeToCompletionwhen the turn finishesduration, generation

The built-in meters read TurnEvent, so they measure language-model turns. The event/export machinery below is general - it records anything you emit.

Measure only what you need

The meters are independent operators; pipe just the ones you want instead of allMetrics:

LanguageModel.streamTurn(request).pipe(Metrics.timeToFirstToken(), Metrics.tokenTotals)

throughput reports a live rate. It counts characters by default (exact on every provider); for tokens, hand it a tokenizer, or estimate:

Metrics.throughput({
every: "1 second",
unit: "token",
tokenizer: (event) => Effect.succeed(event._tag === "TextDelta" ? event.text.length / 4 : 0),
})

It’s windowed by default (the current rate); pass mode: "cumulative" for a running average, or smooth: "default" to damp the jitter.

One generation or a whole loop

Scope follows where you attach. The same meter gives you per-generation numbers on a single turn and whole-run numbers on a loop:

LanguageModel.streamTurn(request).pipe(Metrics.tokenTotals) // this generation
Loop.loop(initial, body).pipe(Metrics.tokenTotals) // the whole loop

tokenTotals emits both this turn’s usage and the cumulative total across every turn it has seen, so on a loop the latest sample is always the running total.

Send it to your dashboard

To export instead of (or alongside) logging, record the same events into metric instruments and provide an OTLP layer. Nothing about the meters changes - you add a sink:

import * as Telemetry from "@effect-uai/core/Telemetry"
metered
.pipe(Telemetry.record({ attributes: { model: request.model } }), Stream.runDrain)
.pipe(Effect.provide(Telemetry.layerOtlp({ url: "http://localhost:4318/v1/metrics" })))

layerOtlp leaves the HttpClient to your runtime, so provide NodeHttpClient / FetchHttpClient at the edge.

Measure your own thing

makeEvent mints a custom metric event - a tool-latency timer, a cost gauge, anything. Emit it from your own operator and the same record exports it, with no change to the recorder:

Metrics.makeEvent({
_tag: "ToolLatency",
turnIndex: 0,
measurements: [{ name: "tool_latency", kind: "timer", value: Duration.millis(elapsed) }],
})

See it run

The basic metrics recipe meters a long Gemini generation end to end - story to a file, metrics to the log.