Skip to content

Google Gemini

Google Gemini

The Gemini provider wraps Google’s streamGenerateContent SSE endpoint and maps it onto the core LanguageModelService shape. Thinking budget is a first-class option for the 2.5+ model line.

Install

Terminal window
pnpm add @betalyra/effect-uai-core @betalyra/effect-uai-google effect

Wire it up

import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import { Gemini, layer as geminiLayer } from "@betalyra/effect-uai-google"
const provider = Layer.unwrap(
Effect.gen(function* () {
const apiKey = yield* Config.redacted("GEMINI_API_KEY")
return geminiLayer({ apiKey, model: "gemini-3-flash-preview" })
}),
)
const runtime = provider.pipe(Layer.provide(FetchHttpClient.layer))

geminiLayer registers two service tags from one underlying implementation:

  • Gemini - the typed tag. Yield this when you want Gemini-specific options (thinkingBudget).
  • LanguageModel - the generic tag. Yield this in provider-portable code; only CommonRequestOptions is accepted at the call site.

Config

interface Config {
readonly apiKey: Redacted.Redacted
readonly model: GoogleModel
readonly baseUrl?: string // defaults to https://generativelanguage.googleapis.com/v1beta
}

apiKey is always Redacted.Redacted - never raw string. Read it with Config.redacted("GEMINI_API_KEY") or wrap manually with Redacted.make.

baseUrl exists for proxies and self-hosted gateways that speak the Gemini protocol. Most apps leave it unset.

Request options

interface GeminiRequestOptions extends CommonRequestOptions {
readonly thinkingBudget?: number
}

On top of the core CommonRequestOptions (tools, toolChoice, temperature, maxOutputTokens):

  • thinkingBudget - Gemini 2.5+ thinking budget, forwarded as generationConfig.thinkingConfig.thinkingBudget. Set to 0 to disable thinking entirely (lowest latency, fastest first-token); higher values let the model think longer before emitting output.

Calling it

import { Effect, Stream } from "effect"
import { Gemini } from "@betalyra/effect-uai-google"
const turn = Effect.gen(function* () {
const gemini = yield* Gemini
return gemini.streamTurn(history, { thinkingBudget: 0 })
})

streamTurn returns Stream<TurnDelta, AiError>. Pipe it through Loop.streamUntilComplete inside a loop body, or consume the deltas directly for one-shot calls.

Models

GoogleModel is a literal union with a (string & {}) tail - you get autocomplete on known IDs but can pass any string for models the SDK hasn’t been updated for yet.

Known IDs (as of April 2026): gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-flash-live-preview, gemini-3.1-flash-tts-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite. Reference: Gemini models.

Errors

HTTP failures map to typed AiError variants:

StatusError
429AiError.RateLimited
408/504AiError.Timeout
401AiError.AuthFailed (auth)
403AiError.AuthFailed (permission)
402AiError.AuthFailed (billing)
413AiError.ContextLengthExceeded
>= 500AiError.Unavailable
other 4xxAiError.InvalidRequest

Recover per-tag with Stream.catchTag("RateLimited", handler). See multi-model fallback for cross-provider recovery between Responses and Gemini.