Is Perplexity Sonar API — Search-Grounded LLM in One Call free to use?

Yes. Perplexity Sonar API — Search-Grounded LLM in One Call is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Perplexity Sonar API — Search-Grounded LLM in One Call?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

KnowledgeMay 11, 2026·5 min read

Perplexity Sonar API — Search-Grounded LLM in One Call

Name: Perplexity Sonar API — Search-Grounded LLM in One Call
Author: Perplexity

Perplexity Sonar API returns LLM answers grounded in real-time web search with citations. Tiers: sonar / sonar-pro / sonar-reasoning.

Perplexity · Community

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 15/100Stage only

Agent surface

Any MCP/CLI agent

Kind

Knowledge

Install

Stage only

Trust

Trust: New

Entrypoint

Asset

Universal CLI install command

npx tokrepo install 25b2aa98-cc43-4d6c-b654-5baa3f3c9f62

install contract metadata JSON adapter plan raw content

Intro

Perplexity's Sonar API is a one-call alternative to building search + scrape + chunk + RAG yourself — you send a question, Perplexity searches the web in real time and returns an LLM answer with inline numbered citations to the source URLs. Three tiers: sonar (fast/cheap), sonar-pro (better answer quality, more sources), sonar-reasoning (chain-of-thought, longer think time). Best for: news Q&A, fact-checking, anywhere you need a fresh answer with sources. Works with: OpenAI-compatible client (Python, JS), curl, LangChain. Setup time: 2 minutes.

Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model	Use case	Cost ($/1M)	Latency
`sonar`	Quick lookups, single-source Q&A	$1 in / $1 out	~1–3s
`sonar-pro`	Production answer quality, multi-source	$3 in / $15 out	~3–7s
`sonar-reasoning`	Hard reasoning, citations + thinking	$1 in / $5 out	~10–25s
`sonar-reasoning-pro`	Top quality reasoning	$2 in / $8 out	~15–40s
`sonar-deep-research`	Long research reports with 30+ sources	$2 in / $8 out + per-search	~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.

FAQ

Q: Sonar vs Grok Live Search vs Tavily? A: Grok bundles search into the same model call cheaply. Sonar gives stronger answer quality and richer citations. Tavily is search-only (you bring your own LLM). Use Sonar when answer quality matters; Tavily when you need control over the LLM stage.

Q: Are citations clickable? A: Citations come back as a citations array of URLs separately from the markdown answer. Render them as numbered footnotes in your UI. Sonar's content also embeds [1], [2] inline so you can map them visually.

Q: Rate limits? A: Standard tier: ~50 RPM on sonar, ~20 RPM on sonar-pro. Higher tiers in console.perplexity.ai. For production scaling beyond, talk to Perplexity Sales — they offer dedicated capacity.

Quick Use

Get PPLX_API_KEY at perplexity.ai/settings/api
OpenAI(base_url='https://api.perplexity.ai', api_key=PPLX_KEY)
Use model='sonar-pro' and read resp.citations for source URLs

Intro

Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model	Use case	Cost ($/1M)	Latency
`sonar`	Quick lookups, single-source Q&A	$1 in / $1 out	~1–3s
`sonar-pro`	Production answer quality, multi-source	$3 in / $15 out	~3–7s
`sonar-reasoning`	Hard reasoning, citations + thinking	$1 in / $5 out	~10–25s
`sonar-reasoning-pro`	Top quality reasoning	$2 in / $8 out	~15–40s
`sonar-deep-research`	Long research reports with 30+ sources	$2 in / $8 out + per-search	~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.

FAQ

Source & Thanks

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

🙏

Source & Thanks

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Weave — Trace and Debug LLM Apps

Weave adds tracing to LLM apps with `@weave.op`. Install `weave`, call `weave.init()`, then track inputs/outputs across API calls and validation steps.

Knowledge

Agent Toolkit

Helicone Sessions — Group LLM Calls by User Conversation

Helicone Sessions group multiple LLM calls under one session ID. Trace a multi-step agent run end-to-end, see total cost, latency, conversation flow.

Knowledge

Helicone

Helicone Cache — Cut LLM Spend with Drop-In Response Caching

Helicone Cache short-circuits identical LLM requests at the proxy. Set Helicone-Cache-Enabled header, exact-match responses come back in ms at zero cost.

Knowledge

Helicone

Statewave — Memory Runtime for AI Agents (API + SDKs)

Statewave is a self-hostable memory runtime: ingest episodes, compile memories, do semantic search, and build token-bounded context bundles via REST.

Knowledge

AI Open Source