Skip to content
All use cases

Build LLM context from any site

Markdown in, prompt-ready out.

Turn blogs, docs, and knowledge bases into clean Markdown ready to feed an LLM or a vector store.

LLMs are only as good as the context you give them. WebReaper discovers a site's URLs, fetches each page, and emits clean Markdown, with no schema to define.

Discover, then scrape

# Find the URLs you care about
webreaper map https://example.com --search /blog/ --max-urls 50
 
# Turn the whole site into Markdown, one JSON record per page
webreaper crawl https://example.com > corpus.jsonl

Each record is { "url": "...", "title": "...", "markdown": "..." }, ready to chunk, embed, and store. No selectors, no parsing code, no cleanup pass.

Ready to try it?

Install the CLI and run your first command in seconds.

Get started