AI features overview
Enable AI extraction with one line and bring your own LLM.
AI in WebReaper is opt-in and composable. One call, .UseAi(...), wires a set of
AI behaviors into the normal extraction pipeline, and you bring your own model
through Microsoft.Extensions.AI. Deterministic extraction still runs first
wherever it can, so you pay for the model only when it adds value.
The one-liner
Pass your chat client and a policy:
using WebReaper.Builders;
var engine = await ScraperEngineBuilder
.Crawl("https://example.com")
.Extract(Article.Schema)
.UseAi(chatClient, new AiOptions(Policy: AiPolicyMode.Recommended))
.WriteToJsonFile("articles.json")
.BuildAsync();
await engine.RunAsync();The AI capabilities live in the WebReaper.AI satellite package, so add it
alongside the core package:
dotnet add package WebReaper.AIPolicy modes
AiPolicyMode selects which AI behaviors get wired in:
Recommendedthe balanced default: a deterministic-first extractor with an LLM fallback, a self-healing selector repairer, and an action resolver.LlmPrimarythe LLM does the extraction every time, replacing the deterministic extractor.ExtractionOnlywires the extractor behavior only, with no action resolver.Inferredinfers the schema from your goal at runtime instead of requiring one up front (pair it with.ExtractInferred(goal)).Nonean escape hatch that wires nothing on the scraper, for tests and bespoke compositions.
Bring your own LLM
WebReaper never ships a model or a hosted endpoint. You construct an
IChatClient from Microsoft.Extensions.AI and hand it over. That means any
supported provider works the same way:
- OpenAI
- Anthropic
- Azure OpenAI
- Ollama for local models
You point at your provider, you authenticate with your provider, and you pay your provider directly.
Going deeper
.UseAi(...) is the convenient front door, but every behavior also has an
à la carte method if you want fine control:
.WithLlmFallback(...), .WithLlmSelfHealing(...),
.WithLlmSchemaInferrer(...), and .WithLlmActionResolver(...). The
LLM extraction and fallback page walks through extraction
specifically, and the autonomous agent page covers
letting a model drive the crawl itself.