Scraping Cloudflare-protected sites from .NET
A practical walkthrough of WebReaper's browser and auto-stealth escalation for sites behind Cloudflare, DataDome, and other bot managers.
A plain HTTP request to a protected site does not return the page. It returns an interstitial: a challenge, a 403, or a polite block telling you to enable JavaScript. Cloudflare, DataDome, PerimeterX, Incapsula, and Akamai all play some version of this game, and getting past the easy ones is mostly about looking like a real browser.
WebReaper handles this as a ladder you climb only as far as you need. Start with a plain request, escalate to a real browser, and escalate again to a stealth browser when the site demands it. This post walks the ladder from the CLI, then shows the library equivalent.
Step one: try a real browser
If a site fails a plain fetch but is not running aggressive fingerprinting, a real headless browser is often enough, because the page just needs JavaScript to execute. A plain scrape already climbs to a browser on its own when a page looks blocked, so the flag below just starts there and skips the HTTP probe.
webreaper scrape https://protected.example.com --browser --output page.mdThe first browser run downloads its browser binaries, so expect a one-time delay. After that, the page renders in a real engine, scripts run, and the challenge that blocked your HTTP client resolves on its own. For a large share of "this site is blocking me" cases, the story ends here.
Step two: let it climb to stealth
Some sites look harder at who is knocking. When a real browser still comes back with a 403, a 429, a 503, a challenge response header, or an empty result on a page carrying a known challenge marker, WebReaper climbs one more rung to a stealth Chromium backend tuned to pass those checks.
webreaper scrape https://protected.example.com --stealth--stealth starts the climb at the stealth backend; a plain scrape blocked at a
vanilla browser climbs there on its own. The climb happens inside a single load,
per page: WebReaper detects the block (a challenge status, a response header, or
a body marker, covering Cloudflare, DataDome, PerimeterX, Incapsula, and Akamai),
escalates to the next rung, reloads, and returns the best result it reached. A
page still blocked at the top rung is dropped rather than emitted as a challenge
page, and the run exits non-zero so you know.
Unattended runs
Including the stealth rung downloads a few hundred megabytes on first use, so at
an interactive terminal WebReaper asks once, at startup, before committing to it.
In CI, a cron job, or an agent loop there is no one to answer, so set an
environment variable (or pass --auto-stealth) and it includes the stealth rung
without a prompt.
WEBREAPER_AUTO_STEALTH=1 webreaper scrape https://protected.example.com --output page.mdWith that set, the climb runs the full ladder unattended. It never spins
forever: a page still blocked at the top rung is dropped and the run exits with a
clear, non-zero status. Pass --no-auto-stealth to cap the climb at a vanilla
browser.
The library equivalent
The CLI ladder maps directly onto the library. The middle rung, a real browser, is a page loader from WebReaper.Playwright or the lower-level WebReaper.Cdp.
using WebReaper.Builders;
var engine = await ScraperEngineBuilder
.CrawlWithBrowser("https://protected.example.com")
.AsMarkdown()
.WithPlaywrightPageLoader()
.WriteToConsole()
.BuildAsync();
await engine.RunAsync();When you need the top rung, add the WebReaper.Stealth.CloakBrowser package and swap in the stealth loader. It finds or downloads the stealth binary on first use, launches it with the recommended flags, and routes pages through it.
using WebReaper.Builders;
var engine = await ScraperEngineBuilder
.CrawlWithBrowser("https://protected.example.com")
.AsMarkdown()
.WithCloakBrowser()
.WriteToConsole()
.BuildAsync();
await engine.RunAsync();Because the stealth backend launches a subprocess, dispose the engine so the process is torn down on scope exit. The idiomatic form handles it for you.
await using var engine = await ScraperEngineBuilder
.CrawlWithBrowser("https://protected.example.com")
.AsMarkdown()
.WithCloakBrowser()
.WriteToConsole()
.BuildAsync();
await engine.RunAsync();The first-use download for the stealth backend is large, on the order of a few hundred megabytes, since it is a full Chromium fork. For air-gapped or CI environments, pre-install the binary so a run never tries to download at an inconvenient moment.
An honest limit
Climbing this ladder defeats fingerprinting and JavaScript challenges, which is the bulk of what stops a scraper from a clean IP. It does not defeat an interactive captcha, and no tool truthfully claims to. When a site decides to put a human-in-the-loop puzzle in front of every request, the realistic answer is a captcha-solving service, residential proxies, or a conversation about whether that site wants to be scraped at all. WebReaper's job is to get you past the automated gates cheaply and to tell you plainly when you have hit a wall a better browser cannot climb.