Jina Reader — Convert Any URL to LLM-Ready Text
Convert any URL to clean, LLM-friendly markdown with a simple prefix. Just prepend r.jina.ai/ to any URL. Handles JS-rendered pages, PDFs, and images. 10K+ stars.
What it is
Jina Reader is a free API service that converts any web URL into clean, LLM-friendly markdown. You prepend r.jina.ai/ to any URL and get back the page content as structured markdown, stripped of ads, navigation, and boilerplate. It handles JavaScript-rendered pages, PDFs, and even images (via OCR).
It serves developers building RAG pipelines, AI agents that need to read web content, or anyone who wants to feed web pages to an LLM without dealing with HTML parsing and content extraction.
How it saves time or tokens
Raw HTML is bloated with navigation, scripts, ads, and formatting tags that waste LLM tokens. Jina Reader extracts just the content and returns clean markdown, typically reducing input size by 70-90% compared to raw HTML. This means lower token costs and better LLM comprehension since the model processes only relevant text.
How to use
- The simplest method -- prepend the URL:
curl https://r.jina.ai/https://example.com
- Use in Python:
import requests
url = 'https://r.jina.ai/https://docs.python.org/3/tutorial/'
response = requests.get(url)
markdown_content = response.text
print(markdown_content[:500])
- Use in an AI agent pipeline:
def read_url(url: str) -> str:
resp = requests.get(f'https://r.jina.ai/{url}')
return resp.text
# Feed the result to your LLM as context
context = read_url('https://docs.anthropic.com/en/docs/welcome')
Example
| Input | Output |
|---|---|
| HTML page with ads, nav, scripts | Clean markdown with headings, lists, code blocks |
| PDF document | Extracted text as markdown |
| Image with text | OCR-extracted text |
| JS-rendered SPA | Rendered content as markdown |
The API handles the rendering and extraction automatically. No configuration needed for most use cases.
Related on TokRepo
- AI tools for web-scraping -- web scraping tools for AI
- AI tools for RAG -- RAG pipeline components and tools
Common pitfalls
- Some websites block automated requests. Jina Reader handles most sites but very aggressive anti-bot protection may prevent content extraction.
- Very long pages may be truncated. For large documents, check the response length and consider splitting across multiple requests if needed.
- The free tier has rate limits. For high-volume production use, check Jina's pricing for higher rate limits.
Frequently Asked Questions
Jina Reader offers a free tier with rate limits suitable for development and moderate use. You can make requests without an API key for basic usage. Higher rate limits and additional features are available through Jina's paid plans. For most development and prototyping workflows, the free tier is sufficient.
Jina Reader uses a headless browser to render JavaScript before extracting content. This means single-page applications (SPAs) built with React, Vue, or Angular are fully rendered before the content is extracted. The resulting markdown contains the actual displayed content, not the raw HTML source.
Yes, this is one of the primary use cases. You can use Jina Reader to convert web documentation, articles, or any URL into clean markdown, then chunk and embed that markdown for vector search. The clean output reduces noise in your embeddings compared to parsing raw HTML yourself.
Yes. Point Jina Reader at a PDF URL and it extracts the text content as markdown. This includes text extraction, basic structure preservation (headings, lists), and table extraction where possible. Complex PDF layouts with multi-column text or heavy graphical elements may not convert perfectly.
Jina Reader uses a combination of headless browser rendering, content extraction algorithms (similar to readability), and formatting conversion. It identifies the main content area of a page, strips navigation, ads, and boilerplate, and converts the remaining HTML to clean markdown with proper headings, links, code blocks, and lists.
Citations (3)
- Jina Reader GitHub— Jina Reader API and documentation
- Jina Reader Official— Jina AI platform and services
- Mozilla Readability— Readability-based content extraction
Related on TokRepo
Source & Thanks
Created by Jina AI. Licensed under Apache 2.0. jina-ai/reader — 10,000+ GitHub stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.