Quick Use
- Already have a Tavily API key (from search asset)
client.extract(urls=[...], extract_depth="advanced")— pass up to 20 URLs- Iterate
response["results"]for clean markdown per URL
Intro
Tavily Extract takes a list of URLs and returns clean LLM-ready markdown — no HTML, no ads, no nav menus, no cookie banners. Up to 20 URLs per call, with extract_depth: advanced for tricky sites. Best for: agents that have a list of URLs (from Search, your own sources, or user input) and need the actual content. Works with: Tavily REST API, Python / TypeScript SDK. Setup time: 2 minutes.
Extract clean content
from tavily import TavilyClient
client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
response = client.extract(
urls=[
"https://docs.anthropic.com/en/docs/claude-code",
"https://docs.cursor.com/composer",
"https://docs.continue.dev/intro",
],
extract_depth="advanced", # vs "basic" — slower but cleaner on JS-heavy sites
include_images=False,
)
for result in response["results"]:
print(result["url"])
print(result["raw_content"][:500])
print(f"({len(result['raw_content'])} chars total)")
# Failed URLs (404, blocked, etc) listed separately
for failed in response["failed_results"]:
print(f"FAILED: {failed['url']} — {failed['error']}")Pair with Search for full RAG
search = client.search(query="claude code subagents best practices", max_results=10)
urls = [r["url"] for r in search["results"]]
# Get full content for top 5
extracts = client.extract(urls=urls[:5], extract_depth="advanced")
# Now feed both summaries (from search) and full text (from extract) to an LLM
context = "\n\n".join(e["raw_content"] for e in extracts["results"])Cost vs Search
| Endpoint | Cost | Output |
|---|---|---|
/search |
1-2 credits | Snippets + answer + URLs |
/extract (basic) |
1 credit / URL | Full markdown of 1 URL |
/extract (advanced) |
2 credits / URL | Full markdown of 1 URL, JS rendering |
For RAG: use Search to find URLs, Extract for the ones worth deep-reading. Don't extract every search result — most are summary-quality already in Search output.
FAQ
Q: How is Tavily Extract different from Firecrawl? A: Both produce LLM-ready markdown. Firecrawl is dedicated to scraping with more knobs (Crawl, Map, structured Extract via schema). Tavily Extract is the URL-to-content companion of Tavily Search, optimized for batch extraction during agent runs. Different ergonomics, similar output.
Q: Does it handle paywalls? A: No — Tavily Extract respects paywalls. It returns the public preview content, not the paywalled article. For internal authenticated sources, use Tavily's enterprise tier with custom auth.
Q: Can I extract images?
A: Yes — set include_images=True. The response includes image URLs and alt text. Images are linked, not downloaded; you'd fetch them separately if needed.
Source & Thanks
Built by Tavily. Commercial product with free tier.
tavily.com/docs/extract — Extract docs