LLMs are only as good as the context you give them. WebReaper discovers a site's URLs, fetches each page, and emits clean Markdown, with no schema to define.

Discover, then scrape

# Find the URLs you care about
webreaper map https://example.com --search /blog/ --max-urls 50
 
# Turn the whole site into Markdown, one JSON record per page
webreaper crawl https://example.com > corpus.jsonl

Each record is { "url": "...", "title": "...", "markdown": "..." }, ready to chunk, embed, and store. No selectors, no parsing code, no cleanup pass.