Google Gemini
Google Gemini
The Gemini provider wraps Google’s streamGenerateContent SSE endpoint
and maps it onto the core LanguageModelService shape. Thinking budget
is a first-class option for the 2.5+ model line.
Install
pnpm add @betalyra/effect-uai-core @betalyra/effect-uai-google effectWire it up
import { Config, Effect, Layer } from "effect"import { FetchHttpClient } from "effect/unstable/http"import { Gemini, layer as geminiLayer } from "@betalyra/effect-uai-google"
const provider = Layer.unwrap( Effect.gen(function* () { const apiKey = yield* Config.redacted("GEMINI_API_KEY") return geminiLayer({ apiKey, model: "gemini-3-flash-preview" }) }),)
const runtime = provider.pipe(Layer.provide(FetchHttpClient.layer))geminiLayer registers two service tags from one underlying
implementation:
Gemini- the typed tag. Yield this when you want Gemini-specific options (thinkingBudget).LanguageModel- the generic tag. Yield this in provider-portable code; onlyCommonRequestOptionsis accepted at the call site.
Config
interface Config { readonly apiKey: Redacted.Redacted readonly model: GoogleModel readonly baseUrl?: string // defaults to https://generativelanguage.googleapis.com/v1beta}apiKey is always Redacted.Redacted - never raw string. Read it
with Config.redacted("GEMINI_API_KEY") or wrap manually with
Redacted.make.
baseUrl exists for proxies and self-hosted gateways that speak the
Gemini protocol. Most apps leave it unset.
Request options
interface GeminiRequestOptions extends CommonRequestOptions { readonly thinkingBudget?: number}On top of the core CommonRequestOptions (tools, toolChoice,
temperature, maxOutputTokens):
thinkingBudget- Gemini 2.5+ thinking budget, forwarded asgenerationConfig.thinkingConfig.thinkingBudget. Set to0to disable thinking entirely (lowest latency, fastest first-token); higher values let the model think longer before emitting output.
Calling it
import { Effect, Stream } from "effect"import { Gemini } from "@betalyra/effect-uai-google"
const turn = Effect.gen(function* () { const gemini = yield* Gemini return gemini.streamTurn(history, { thinkingBudget: 0 })})streamTurn returns Stream<TurnDelta, AiError>. Pipe it through
Loop.streamUntilComplete inside a loop body, or consume the deltas
directly for one-shot calls.
Models
GoogleModel is a literal union with a (string & {}) tail - you get
autocomplete on known IDs but can pass any string for models the SDK
hasn’t been updated for yet.
Known IDs (as of April 2026): gemini-3.1-pro-preview,
gemini-3-flash-preview, gemini-3.1-flash-lite-preview,
gemini-3.1-flash-live-preview, gemini-3.1-flash-tts-preview,
gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite.
Reference: Gemini models.
Errors
HTTP failures map to typed AiError variants:
| Status | Error |
|---|---|
429 | AiError.RateLimited |
408/504 | AiError.Timeout |
401 | AiError.AuthFailed (auth) |
403 | AiError.AuthFailed (permission) |
402 | AiError.AuthFailed (billing) |
413 | AiError.ContextLengthExceeded |
>= 500 | AiError.Unavailable |
| other 4xx | AiError.InvalidRequest |
Recover per-tag with Stream.catchTag("RateLimited", handler). See
multi-model fallback for cross-provider
recovery between Responses and Gemini.