KnowledgeMay 11, 2026·5 min read

Perplexity Sonar API — Search-Grounded LLM in One Call

Perplexity Sonar API returns LLM answers grounded in real-time web search with citations. Tiers: sonar / sonar-pro / sonar-reasoning.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 27/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Knowledge
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 25b2aa98-cc43-4d6c-b654-5baa3f3c9f62 --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

Perplexity's Sonar API is a one-call alternative to building search + scrape + chunk + RAG yourself — you send a question, Perplexity searches the web in real time and returns an LLM answer with inline numbered citations to the source URLs. Three tiers: sonar (fast/cheap), sonar-pro (better answer quality, more sources), sonar-reasoning (chain-of-thought, longer think time). Best for: news Q&A, fact-checking, anywhere you need a fresh answer with sources. Works with: OpenAI-compatible client (Python, JS), curl, LangChain. Setup time: 2 minutes.


Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model Use case Cost ($/1M) Latency
sonar Quick lookups, single-source Q&A $1 in / $1 out ~1–3s
sonar-pro Production answer quality, multi-source $3 in / $15 out ~3–7s
sonar-reasoning Hard reasoning, citations + thinking $1 in / $5 out ~10–25s
sonar-reasoning-pro Top quality reasoning $2 in / $8 out ~15–40s
sonar-deep-research Long research reports with 30+ sources $2 in / $8 out + per-search ~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.


FAQ

Q: Sonar vs Grok Live Search vs Tavily? A: Grok bundles search into the same model call cheaply. Sonar gives stronger answer quality and richer citations. Tavily is search-only (you bring your own LLM). Use Sonar when answer quality matters; Tavily when you need control over the LLM stage.

Q: Are citations clickable? A: Citations come back as a citations array of URLs separately from the markdown answer. Render them as numbered footnotes in your UI. Sonar's content also embeds [1], [2] inline so you can map them visually.

Q: Rate limits? A: Standard tier: ~50 RPM on sonar, ~20 RPM on sonar-pro. Higher tiers in console.perplexity.ai. For production scaling beyond, talk to Perplexity Sales — they offer dedicated capacity.


Quick Use

  1. Get PPLX_API_KEY at perplexity.ai/settings/api
  2. OpenAI(base_url='https://api.perplexity.ai', api_key=PPLX_KEY)
  3. Use model='sonar-pro' and read resp.citations for source URLs

Intro

Perplexity's Sonar API is a one-call alternative to building search + scrape + chunk + RAG yourself — you send a question, Perplexity searches the web in real time and returns an LLM answer with inline numbered citations to the source URLs. Three tiers: sonar (fast/cheap), sonar-pro (better answer quality, more sources), sonar-reasoning (chain-of-thought, longer think time). Best for: news Q&A, fact-checking, anywhere you need a fresh answer with sources. Works with: OpenAI-compatible client (Python, JS), curl, LangChain. Setup time: 2 minutes.


Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model Use case Cost ($/1M) Latency
sonar Quick lookups, single-source Q&A $1 in / $1 out ~1–3s
sonar-pro Production answer quality, multi-source $3 in / $15 out ~3–7s
sonar-reasoning Hard reasoning, citations + thinking $1 in / $5 out ~10–25s
sonar-reasoning-pro Top quality reasoning $2 in / $8 out ~15–40s
sonar-deep-research Long research reports with 30+ sources $2 in / $8 out + per-search ~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.


FAQ

Q: Sonar vs Grok Live Search vs Tavily? A: Grok bundles search into the same model call cheaply. Sonar gives stronger answer quality and richer citations. Tavily is search-only (you bring your own LLM). Use Sonar when answer quality matters; Tavily when you need control over the LLM stage.

Q: Are citations clickable? A: Citations come back as a citations array of URLs separately from the markdown answer. Render them as numbered footnotes in your UI. Sonar's content also embeds [1], [2] inline so you can map them visually.

Q: Rate limits? A: Standard tier: ~50 RPM on sonar, ~20 RPM on sonar-pro. Higher tiers in console.perplexity.ai. For production scaling beyond, talk to Perplexity Sales — they offer dedicated capacity.


Source & Thanks

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

🙏

Source & Thanks

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets