CLI reference
Every WebReaper subcommand, its flags, and the output it produces.
The webreaper binary exposes four subcommands: scrape for a single page,
map to discover URLs, crawl for a whole site, and init to wire up the
Claude Code skill. Markdown goes to stdout by default; pass a schema to get JSON.
Update hints and progress write to stderr, so piping to a file stays clean.
scrape
Turn a single page into clean, LLM-ready Markdown printed to stdout:
webreaper scrape https://news.ycombinator.comWrite the Markdown to a file instead:
webreaper scrape https://news.ycombinator.com --output page.mdExtract structured data with a JSON schema. The result is JSON; a multi-page run emits JSON Lines, one object per page:
webreaper scrape https://news.ycombinator.com --schema schema.jsonRender JavaScript-heavy pages through a real browser, and let the CLI escalate to a stealth browser when it detects a bot-check:
webreaper scrape https://example.com --browser --auto-stealthKey flags:
--output <file>write Markdown to a file rather than stdout.--schema <file>extract JSON using the given schema (JSON Lines for many pages).--browserrender the page with a real browser for JS-heavy sites.--auto-stealthescalate to a stealth browser automatically on a bot-check.--no-auto-stealthdisable the escalation heuristic.
map
Discover URLs on a site without scraping their contents. Filter by a path substring and cap how many you collect:
webreaper map https://example.com --search /blog/ --max-urls 50This prints the matching URLs, which is handy for feeding a follow-up scrape
or crawl run, or for understanding a site's structure first.
crawl
Walk every on-domain page and emit JSON Lines, one record per page. Each record carries the URL, the page title, and the page as Markdown:
webreaper crawl https://example.com > pages.jsonlEach line looks like:
{"url": "https://example.com/about", "title": "About", "markdown": "# About\n..."}Because the output is JSON Lines, you can stream it straight into jq, a
database loader, or an embedding pipeline.
init
Write the bundled Claude Code skill into the current project:
webreaper initThis drops a SKILL.md at .claude/skills/webreaper/SKILL.md. After that,
Claude Code routes scraping requests to the CLI on its own. See the
Claude Code skill page for details.
Output shapes at a glance
scrapewith no schema: Markdown to stdout (or to--output).scrape --schema: JSON for one page, JSON Lines for many.map: a list of discovered URLs.crawl: JSON Lines of{url, title, markdown}, one per page.