Google Gemini
The Gemini provider wraps Google’s streamGenerateContent SSE endpoint
and maps it onto the core LanguageModelService shape. Thinking budget
is a first-class option for the 2.5+ model line.
Install
pnpm add @effect-uai/core @effect-uai/google effectWire it up
import { Config, Effect, Layer } from "effect"import { FetchHttpClient } from "effect/unstable/http"import { Gemini, layer as geminiLayer } from "@effect-uai/google"
const provider = Layer.unwrap( Effect.gen(function* () { const apiKey = yield* Config.redacted("GEMINI_API_KEY") return geminiLayer({ apiKey }) }),)
const mainLayer = provider.pipe(Layer.provide(FetchHttpClient.layer))geminiLayer registers two service tags from one underlying
implementation:
Gemini- the typed tag. Yield this when you want Gemini-specific options (thinkingBudget).LanguageModel- the generic tag. Yield this in provider-portable code; onlyCommonRequestOptionsis accepted at the call site.
Config
interface Config { readonly apiKey: Redacted.Redacted readonly baseUrl?: string // defaults to https://generativelanguage.googleapis.com/v1beta}The layer carries connection details only. model is per call (see
below). apiKey is always Redacted.Redacted - never raw string.
Read it with Config.redacted("GEMINI_API_KEY") or wrap manually with
Redacted.make.
baseUrl exists for proxies and self-hosted gateways that speak the
Gemini protocol. Most apps leave it unset.
Request shape
interface GeminiRequest extends Omit<CommonRequest, "model"> { readonly model: GoogleModel // narrows CommonRequest.model: string readonly thinkingBudget?: number}On top of the core CommonRequest (history, model, tools,
toolChoice, temperature, maxOutputTokens):
model- typed againstGoogleModelfor autocomplete at the call site.thinkingBudget- Gemini 2.5+ thinking budget, forwarded asgenerationConfig.thinkingConfig.thinkingBudget. Set to0to disable thinking entirely (lowest latency, fastest first-token); higher values let the model think longer before emitting output.
Calling it
import { Effect, Stream } from "effect"import { Gemini } from "@effect-uai/google"
const turn = Effect.gen(function* () { const gemini = yield* Gemini return gemini.streamTurn({ history, model: "gemini-2.5-flash", thinkingBudget: 0, })})streamTurn returns Stream<TurnDelta, AiError>. Pipe it through
Loop.onTurnComplete inside a loop body, or consume the deltas
directly for one-shot calls.
Models
GoogleModel is a literal union with a (string & {}) tail - you get
autocomplete on known IDs but can pass any string for models the SDK
hasn’t been updated for yet.
Known IDs (as of April 2026): gemini-3.1-pro-preview,
gemini-3-flash-preview, gemini-3.1-flash-lite-preview,
gemini-3.1-flash-live-preview, gemini-3.1-flash-tts-preview,
gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite.
Reference: Gemini models.
Errors
HTTP failures map to typed AiError variants:
| Status | Error |
|---|---|
429 | AiError.RateLimited |
408/504 | AiError.Timeout |
401 | AiError.AuthFailed (auth) |
403 | AiError.AuthFailed (permission) |
402 | AiError.AuthFailed (billing) |
413 | AiError.ContextLengthExceeded |
>= 500 | AiError.Unavailable |
| other 4xx | AiError.InvalidRequest |
Recover per-tag with Stream.catchTag("RateLimited", handler). See
multi-model fallback for cross-provider
recovery between Responses and Gemini.