Library quickstart
Build and run a scraper in .NET with the fluent ScraperEngineBuilder.
The WebReaper library is built around a fluent builder. You start from a crawl seed, choose how to extract each page, pick where the results go, then build and run the engine. Everything has sensible in-memory defaults, so a working scraper is a handful of chained calls.
Add the package
dotnet add package WebReaperYour first engine
Crawl a site, convert each page to Markdown, and print it to the console:
using WebReaper.Builders;
var engine = await ScraperEngineBuilder
.Crawl("https://news.ycombinator.com")
.AsMarkdown()
.WriteToConsole()
.BuildAsync();
await engine.RunAsync();Crawl(url) is the seed. AsMarkdown() is the extraction strategy.
WriteToConsole() is the sink. BuildAsync() constructs a runnable engine and
RunAsync() drives the crawl.
Pick a seed
Two seeds start a scrape, depending on whether the pages need a browser:
// Plain HTTP loading (the default, fastest path)
ScraperEngineBuilder.Crawl("https://example.com")
// Render pages with a real browser for JS-heavy sites
ScraperEngineBuilder.CrawlWithBrowser("https://example.com")Choose an extraction strategy
The seed offers three terminals, each returning the builder:
// 1. Clean, LLM-ready Markdown, no schema required
.AsMarkdown()
// 2. Deterministic structured extraction against a schema
.Extract(Article.Schema)
// 3. Infer the schema at runtime from a plain-language goal (AI)
.ExtractInferred("product name, price, and rating")Extract pairs with a schema; the schema extraction
page shows how to define one with the source generator. ExtractInferred
needs an LLM, covered in LLM extraction and fallback.
Send results to a sink
Print to the console, or write to a file:
// Console
.WriteToConsole()
// JSON Lines file (one object per line)
.WriteToJsonFile("results.json")Note that WriteToJsonFile writes JSON Lines (one object per line, not a JSON
array) and wipes the file on start by default.
Build and run
var engine = await ScraperEngineBuilder
.CrawlWithBrowser("https://example.com")
.Extract(Article.Schema)
.WriteToJsonFile("articles.json")
.BuildAsync();
await engine.RunAsync();From here, add an LLM, a browser or stealth backend, or a distributed backend with a few more chained calls.