ScriptsMar 31, 2026·2 min read

Jina Reader — Convert Any URL to LLM-Ready Text

Convert any URL to clean, LLM-friendly markdown with a simple prefix. Just prepend r.jina.ai/ to any URL. Handles JS-rendered pages, PDFs, and images. 10K+ stars.

TL;DR
Jina Reader converts any URL to LLM-ready markdown by simply prepending r.jina.ai/ to it. Handles JS pages, PDFs, and images.
§01

What it is

Jina Reader is a free API service that converts any web URL into clean, LLM-friendly markdown. You prepend r.jina.ai/ to any URL and get back the page content as structured markdown, stripped of ads, navigation, and boilerplate. It handles JavaScript-rendered pages, PDFs, and even images (via OCR).

It serves developers building RAG pipelines, AI agents that need to read web content, or anyone who wants to feed web pages to an LLM without dealing with HTML parsing and content extraction.

§02

How it saves time or tokens

Raw HTML is bloated with navigation, scripts, ads, and formatting tags that waste LLM tokens. Jina Reader extracts just the content and returns clean markdown, typically reducing input size by 70-90% compared to raw HTML. This means lower token costs and better LLM comprehension since the model processes only relevant text.

§03

How to use

  1. The simplest method -- prepend the URL:
curl https://r.jina.ai/https://example.com
  1. Use in Python:
import requests

url = 'https://r.jina.ai/https://docs.python.org/3/tutorial/'
response = requests.get(url)
markdown_content = response.text
print(markdown_content[:500])
  1. Use in an AI agent pipeline:
def read_url(url: str) -> str:
    resp = requests.get(f'https://r.jina.ai/{url}')
    return resp.text

# Feed the result to your LLM as context
context = read_url('https://docs.anthropic.com/en/docs/welcome')
§04

Example

InputOutput
HTML page with ads, nav, scriptsClean markdown with headings, lists, code blocks
PDF documentExtracted text as markdown
Image with textOCR-extracted text
JS-rendered SPARendered content as markdown

The API handles the rendering and extraction automatically. No configuration needed for most use cases.

§05

Related on TokRepo

§06

Common pitfalls

  • Some websites block automated requests. Jina Reader handles most sites but very aggressive anti-bot protection may prevent content extraction.
  • Very long pages may be truncated. For large documents, check the response length and consider splitting across multiple requests if needed.
  • The free tier has rate limits. For high-volume production use, check Jina's pricing for higher rate limits.

Frequently Asked Questions

Is Jina Reader free?+

Jina Reader offers a free tier with rate limits suitable for development and moderate use. You can make requests without an API key for basic usage. Higher rate limits and additional features are available through Jina's paid plans. For most development and prototyping workflows, the free tier is sufficient.

How does Jina Reader handle JavaScript-rendered pages?+

Jina Reader uses a headless browser to render JavaScript before extracting content. This means single-page applications (SPAs) built with React, Vue, or Angular are fully rendered before the content is extracted. The resulting markdown contains the actual displayed content, not the raw HTML source.

Can I use Jina Reader for building RAG pipelines?+

Yes, this is one of the primary use cases. You can use Jina Reader to convert web documentation, articles, or any URL into clean markdown, then chunk and embed that markdown for vector search. The clean output reduces noise in your embeddings compared to parsing raw HTML yourself.

Does Jina Reader support PDF extraction?+

Yes. Point Jina Reader at a PDF URL and it extracts the text content as markdown. This includes text extraction, basic structure preservation (headings, lists), and table extraction where possible. Complex PDF layouts with multi-column text or heavy graphical elements may not convert perfectly.

How does the content extraction work?+

Jina Reader uses a combination of headless browser rendering, content extraction algorithms (similar to readability), and formatting conversion. It identifies the main content area of a page, strips navigation, ads, and boilerplate, and converts the remaining HTML to clean markdown with proper headings, links, code blocks, and lists.

Citations (3)
🙏

Source & Thanks

Created by Jina AI. Licensed under Apache 2.0. jina-ai/reader — 10,000+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets