Market intel
You have a list of competitor URLs and you want one clean, comparable record per vendor: name, tiers, prices, whether there is a free tier, key features. Every vendor lays its pricing page out differently, so there is no set of CSS selectors that works across all of them. This is where an LLM plus a schema beats a scraper: you describe the shape you want once, and the model fills it from whatever the page happens to look like.
Scenario. Point the recipe at the pricing pages of the web-extraction
providers themselves (Firecrawl, Exa, Tavily, ScrapingBee). one charges by
credits, one per seat, one by tier, so no two pages share a layout. Each is
read to clean markdown, extracted into a typed Product, and the batch runs
concurrently. Pages that fail (a fetch error, a refusal, a schema mismatch)
come back as a Result.Failure instead of sinking the whole run.
Read, then extract
There is no extract primitive. the recipe composes two you already have.
WebRead.read(url) returns clean markdown, and a structured model turn
decodes that markdown against your Schema:
export const extractProduct = (cfg: MarketIntelConfig, url: string) => Effect.flatMap(WebRead.read({ url }), (page) => decodePage(cfg.model, page.content))decodePage runs one LanguageModel.streamTurn({ ..., structured: productFormat })
and folds it down to the decoded Product with Turn.decodeStructured. The
schema is the contract: the model is told exactly which fields to return, and
the result is validated locally before you ever see it.
Fan out, keep failures
The batch is one Effect.forEach with a concurrency cap. Each URL is wrapped
in Effect.result, so its outcome is captured as a Result and paired with
its URL:
Effect.forEach( cfg.urls, (url) => Effect.result(extractProduct(cfg, url)).pipe(Effect.map((r) => [url, r] as const)), { concurrency: cfg.concurrency },)Cap concurrency for your read provider’s QPS. one slow or broken page never
blocks the others, and every input URL gets exactly one row out.
Swap providers freely
recipe.ts names no provider. it is written against the generic WebRead and
LanguageModel tags. The backends are chosen by the Layers in app.ts. The
read backend is picked by READ_PROVIDER (firecrawl, jina, exa, or
tavily), the model by the Gemini Layer. Point READ_PROVIDER at a different
backend and the recipe code does not change at all; that is the whole point of
the generic tag.
Run it
FIRECRAWL_API_KEY=... GOOGLE_API_KEY=... pnpm tsx recipes/market-intel/run-node.ts
# read with a different backend, same recipe (needs that provider's key):READ_PROVIDER=jina JINA_API_KEY=... GOOGLE_API_KEY=... pnpm tsx recipes/market-intel/run-node.tsREAD_PROVIDER=exa EXA_API_KEY=... GOOGLE_API_KEY=... pnpm tsx recipes/market-intel/run-node.ts
# override the pages, model, or concurrency:URLS="https://stripe.com/pricing,https://www.notion.so/pricing" CONCURRENCY=2 \ FIRECRAWL_API_KEY=... GOOGLE_API_KEY=... pnpm tsx recipes/market-intel/run-node.tsThe same app.ts runs under Bun (run-bun.ts) and Deno (run-deno.ts); only
the platform HttpClient differs.
Where to go next
- Summarize the field. Feed the typed records into one more model turn that writes a short competitive summary, so the recipe goes from extract to reason.
- Persist and diff. Store each run’s records and diff them over time to catch a competitor’s price change the day it ships.
- Server-side extract. Some read providers can run the extraction in one
server-side call. when the
ServerSideExtractfast-path lands, this recipe prefers it and falls back to read plus decode, with the same typed output.