Skip to content
All use cases

Clean structured datasets

Typed in, typed out, AOT all the way.

Define a typed POCO, get reflection-free extraction, and ship a native binary that emits JSON or CSV.

When the goal is a dataset, not a summary, you want typed records with stable fields: a row per product, a column per attribute, ready to load into a warehouse or a notebook. Hand-written selector glue and runtime reflection get in the way, especially if you want to ship the scraper as a small, fast binary.

Describe the shape as a POCO

Mark a class for source generation and annotate each field with its selector. No parsing code, no mapping layer:

[ScrapeSchema]
public partial class Product
{
    [ScrapeField(".product-title")]
    public string Title { get; set; }
 
    [ScrapeField(".price", Type = SchemaFieldType.Integer)]
    public int Price { get; set; }
 
    [ScrapeField(".availability")]
    public string Availability { get; set; }
}

At build time the Roslyn source generator emits two members on the partial class: a Product.Schema describing the selectors and a Product.Materialize(JsonObject) that builds a typed instance with no reflection. The class must be partial and each scraped property needs a public setter.

Wire it into a scrape

Pass the generated schema to .Extract(...) and choose a file sink:

using WebReaper.Builders;
 
var engine = await ScraperEngineBuilder
    .Crawl("https://shop.example.com")
    .Extract(Product.Schema)
    .WriteToJsonFile("products.jsonl")   // one JSON object per line
    .BuildAsync();
 
await engine.RunAsync();

Prefer columns? Swap one call:

    .WriteToCsvFile("products.csv")

The JSON file sink writes JSON Lines (one object per line), which streams and appends cleanly for large runs. Note one default to know: WriteToJsonFile wipes its target file on start, the opposite of the other sinks.

Reflection-free, AOT-clean

Because materialization is generated rather than reflected, the extraction path contains no runtime reflection, so the whole scraper can be compiled ahead-of-time into a single native binary. That means fast cold starts and a self-contained artifact you can drop onto a server, a container, or a CI runner with no .NET runtime to install. The CLI ships exactly this way.

The payoff

You write a plain C# class and get back a clean, typed dataset as JSON Lines or CSV, produced by a fast native binary with no reflection and no parsing boilerplate. The shape of your data lives in one place: the POCO.

Ready to try it?

Install the CLI and run your first command in seconds.

Get started