WorkflowsMay 8, 2026·4 min read

Tavily Extract — Pull Clean Content from Any URL

Tavily Extract converts up to 20 URLs into LLM-ready markdown in one API call. Skips ads, navigation, footers. Returns clean prose with citation metadata.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 430a3d0e-2b58-496c-91e8-bbdb5ad65572
Intro

Tavily Extract takes a list of URLs and returns clean LLM-ready markdown — no HTML, no ads, no nav menus, no cookie banners. Up to 20 URLs per call, with extract_depth: advanced for tricky sites. Best for: agents that have a list of URLs (from Search, your own sources, or user input) and need the actual content. Works with: Tavily REST API, Python / TypeScript SDK. Setup time: 2 minutes.


Extract clean content

from tavily import TavilyClient

client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

response = client.extract(
    urls=[
        "https://docs.anthropic.com/en/docs/claude-code",
        "https://docs.cursor.com/composer",
        "https://docs.continue.dev/intro",
    ],
    extract_depth="advanced",  # vs "basic" — slower but cleaner on JS-heavy sites
    include_images=False,
)

for result in response["results"]:
    print(result["url"])
    print(result["raw_content"][:500])
    print(f"({len(result['raw_content'])} chars total)")

# Failed URLs (404, blocked, etc) listed separately
for failed in response["failed_results"]:
    print(f"FAILED: {failed['url']}{failed['error']}")

Pair with Search for full RAG

search = client.search(query="claude code subagents best practices", max_results=10)
urls = [r["url"] for r in search["results"]]

# Get full content for top 5
extracts = client.extract(urls=urls[:5], extract_depth="advanced")

# Now feed both summaries (from search) and full text (from extract) to an LLM
context = "\n\n".join(e["raw_content"] for e in extracts["results"])

Cost vs Search

Endpoint Cost Output
/search 1-2 credits Snippets + answer + URLs
/extract (basic) 1 credit / URL Full markdown of 1 URL
/extract (advanced) 2 credits / URL Full markdown of 1 URL, JS rendering

For RAG: use Search to find URLs, Extract for the ones worth deep-reading. Don't extract every search result — most are summary-quality already in Search output.


FAQ

Q: How is Tavily Extract different from Firecrawl? A: Both produce LLM-ready markdown. Firecrawl is dedicated to scraping with more knobs (Crawl, Map, structured Extract via schema). Tavily Extract is the URL-to-content companion of Tavily Search, optimized for batch extraction during agent runs. Different ergonomics, similar output.

Q: Does it handle paywalls? A: No — Tavily Extract respects paywalls. It returns the public preview content, not the paywalled article. For internal authenticated sources, use Tavily's enterprise tier with custom auth.

Q: Can I extract images? A: Yes — set include_images=True. The response includes image URLs and alt text. Images are linked, not downloaded; you'd fetch them separately if needed.


Quick Use

  1. Already have a Tavily API key (from search asset)
  2. client.extract(urls=[...], extract_depth="advanced") — pass up to 20 URLs
  3. Iterate response["results"] for clean markdown per URL

Intro

Tavily Extract takes a list of URLs and returns clean LLM-ready markdown — no HTML, no ads, no nav menus, no cookie banners. Up to 20 URLs per call, with extract_depth: advanced for tricky sites. Best for: agents that have a list of URLs (from Search, your own sources, or user input) and need the actual content. Works with: Tavily REST API, Python / TypeScript SDK. Setup time: 2 minutes.


Extract clean content

from tavily import TavilyClient

client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

response = client.extract(
    urls=[
        "https://docs.anthropic.com/en/docs/claude-code",
        "https://docs.cursor.com/composer",
        "https://docs.continue.dev/intro",
    ],
    extract_depth="advanced",  # vs "basic" — slower but cleaner on JS-heavy sites
    include_images=False,
)

for result in response["results"]:
    print(result["url"])
    print(result["raw_content"][:500])
    print(f"({len(result['raw_content'])} chars total)")

# Failed URLs (404, blocked, etc) listed separately
for failed in response["failed_results"]:
    print(f"FAILED: {failed['url']}{failed['error']}")

Pair with Search for full RAG

search = client.search(query="claude code subagents best practices", max_results=10)
urls = [r["url"] for r in search["results"]]

# Get full content for top 5
extracts = client.extract(urls=urls[:5], extract_depth="advanced")

# Now feed both summaries (from search) and full text (from extract) to an LLM
context = "\n\n".join(e["raw_content"] for e in extracts["results"])

Cost vs Search

Endpoint Cost Output
/search 1-2 credits Snippets + answer + URLs
/extract (basic) 1 credit / URL Full markdown of 1 URL
/extract (advanced) 2 credits / URL Full markdown of 1 URL, JS rendering

For RAG: use Search to find URLs, Extract for the ones worth deep-reading. Don't extract every search result — most are summary-quality already in Search output.


FAQ

Q: How is Tavily Extract different from Firecrawl? A: Both produce LLM-ready markdown. Firecrawl is dedicated to scraping with more knobs (Crawl, Map, structured Extract via schema). Tavily Extract is the URL-to-content companion of Tavily Search, optimized for batch extraction during agent runs. Different ergonomics, similar output.

Q: Does it handle paywalls? A: No — Tavily Extract respects paywalls. It returns the public preview content, not the paywalled article. For internal authenticated sources, use Tavily's enterprise tier with custom auth.

Q: Can I extract images? A: Yes — set include_images=True. The response includes image URLs and alt text. Images are linked, not downloaded; you'd fetch them separately if needed.


Source & Thanks

Built by Tavily. Commercial product with free tier.

tavily.com/docs/extract — Extract docs

🙏

Source & Thanks

Built by Tavily. Commercial product with free tier.

tavily.com/docs/extract — Extract docs

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets