Skip to content

Google Gemini

The Gemini provider wraps Google’s streamGenerateContent SSE endpoint and maps it onto the core LanguageModelService shape. Thinking budget is a first-class option for the 2.5+ model line.

Install

Terminal window
pnpm add @effect-uai/core @effect-uai/google effect

Wire it up

import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import { Gemini, layer as geminiLayer } from "@effect-uai/google"
const provider = Layer.unwrap(
Effect.gen(function* () {
const apiKey = yield* Config.redacted("GEMINI_API_KEY")
return geminiLayer({ apiKey })
}),
)
const mainLayer = provider.pipe(Layer.provide(FetchHttpClient.layer))

geminiLayer registers two service tags from one underlying implementation:

  • Gemini - the typed tag. Yield this when you want Gemini-specific options (thinkingBudget).
  • LanguageModel - the generic tag. Yield this in provider-portable code; only CommonRequestOptions is accepted at the call site.

Config

interface Config {
readonly apiKey: Redacted.Redacted
readonly baseUrl?: string // defaults to https://generativelanguage.googleapis.com/v1beta
}

The layer carries connection details only. model is per call (see below). apiKey is always Redacted.Redacted - never raw string. Read it with Config.redacted("GEMINI_API_KEY") or wrap manually with Redacted.make.

baseUrl exists for proxies and self-hosted gateways that speak the Gemini protocol. Most apps leave it unset.

Request shape

interface GeminiRequest extends Omit<CommonRequest, "model"> {
readonly model: GoogleModel // narrows CommonRequest.model: string
readonly thinkingBudget?: number
}

On top of the core CommonRequest (history, model, tools, toolChoice, temperature, maxOutputTokens):

  • model - typed against GoogleModel for autocomplete at the call site.
  • thinkingBudget - Gemini 2.5+ thinking budget, forwarded as generationConfig.thinkingConfig.thinkingBudget. Set to 0 to disable thinking entirely (lowest latency, fastest first-token); higher values let the model think longer before emitting output.

Calling it

import { Effect, Stream } from "effect"
import { Gemini } from "@effect-uai/google"
const turn = Effect.gen(function* () {
const gemini = yield* Gemini
return gemini.streamTurn({
history,
model: "gemini-2.5-flash",
thinkingBudget: 0,
})
})

streamTurn returns Stream<TurnDelta, AiError>. Pipe it through Loop.onTurnComplete inside a loop body, or consume the deltas directly for one-shot calls.

Models

GoogleModel is a literal union with a (string & {}) tail - you get autocomplete on known IDs but can pass any string for models the SDK hasn’t been updated for yet.

Known IDs (as of April 2026): gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-3.1-flash-lite-preview, gemini-3.1-flash-live-preview, gemini-3.1-flash-tts-preview, gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite. Reference: Gemini models.

Errors

HTTP failures map to typed AiError variants:

StatusError
429AiError.RateLimited
408/504AiError.Timeout
401AiError.AuthFailed (auth)
403AiError.AuthFailed (permission)
402AiError.AuthFailed (billing)
413AiError.ContextLengthExceeded
>= 500AiError.Unavailable
other 4xxAiError.InvalidRequest

Recover per-tag with Stream.catchTag("RateLimited", handler). See multi-model fallback for cross-provider recovery between Responses and Gemini.