Skip to content

Image generation

Sometimes the answer is a picture, not a paragraph.

Product mocks, illustration drafts, thumbnail variants, image edits inside an agent’s tool loop — they all want the same shape: a prompt (plus optionally a reference image and a mask) goes in, one or more images come out. Synchronous-ish: a few seconds, occasionally tens.

The interaction archetype is one-shot, same as embeddings. Streaming intermediate images exists on a few providers but isn’t broadly supported, so the core abstraction stays simple.

Coming soon

@effect-uai/core will ship an ImageGenerator service tag covering text-to-image, image edit, and inpainting. Provider candidates:

  • OpenAIgpt-image-1, dall-e-3.
  • Google — Imagen 3 / 4 via the Gemini API and Vertex.
  • Black Forest Labs — Flux family (flux-pro, flux-dev).
  • Stability AI — Stable Diffusion family.

The output type reuses the existing MediaSource / Image domain — URL, base64, or bytes — so generated images compose with multimodal language models without extra glue.

Show interest

Open or +1 the image generation tracking issue.