Is Perplexity Sonar API — Search-Grounded LLM in One Call free to use?

Yes. Perplexity Sonar API — Search-Grounded LLM in One Call is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Perplexity Sonar API — Search-Grounded LLM in One Call?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Esta página se muestra en inglés. Una traducción al español está en curso.

KnowledgeMay 11, 2026·5 min de lectura

Perplexity Sonar API — Search-Grounded LLM in One Call

Perplexity Sonar API returns LLM answers grounded in real-time web search with citations. Tiers: sonar / sonar-pro / sonar-reasoning.

Perplexity · Community

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 27/100Política: staging

Superficie agent

Cualquier agent MCP/CLI

Tipo

Knowledge

Instalación

Stage only

Confianza

Confianza: Community

Entrada

Asset

Comando de staging seguro

npx -y tokrepo@latest install 25b2aa98-cc43-4d6c-b654-5baa3f3c9f62 --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

Perplexity's Sonar API is a one-call alternative to building search + scrape + chunk + RAG yourself — you send a question, Perplexity searches the web in real time and returns an LLM answer with inline numbered citations to the source URLs. Three tiers: sonar (fast/cheap), sonar-pro (better answer quality, more sources), sonar-reasoning (chain-of-thought, longer think time). Best for: news Q&A, fact-checking, anywhere you need a fresh answer with sources. Works with: OpenAI-compatible client (Python, JS), curl, LangChain. Setup time: 2 minutes.

Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model	Use case	Cost ($/1M)	Latency
`sonar`	Quick lookups, single-source Q&A	$1 in / $1 out	~1–3s
`sonar-pro`	Production answer quality, multi-source	$3 in / $15 out	~3–7s
`sonar-reasoning`	Hard reasoning, citations + thinking	$1 in / $5 out	~10–25s
`sonar-reasoning-pro`	Top quality reasoning	$2 in / $8 out	~15–40s
`sonar-deep-research`	Long research reports with 30+ sources	$2 in / $8 out + per-search	~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.

FAQ

Q: Sonar vs Grok Live Search vs Tavily? A: Grok bundles search into the same model call cheaply. Sonar gives stronger answer quality and richer citations. Tavily is search-only (you bring your own LLM). Use Sonar when answer quality matters; Tavily when you need control over the LLM stage.

Q: Are citations clickable? A: Citations come back as a citations array of URLs separately from the markdown answer. Render them as numbered footnotes in your UI. Sonar's content also embeds [1], [2] inline so you can map them visually.

Q: Rate limits? A: Standard tier: ~50 RPM on sonar, ~20 RPM on sonar-pro. Higher tiers in console.perplexity.ai. For production scaling beyond, talk to Perplexity Sales — they offer dedicated capacity.

Quick Use

Get PPLX_API_KEY at perplexity.ai/settings/api
OpenAI(base_url='https://api.perplexity.ai', api_key=PPLX_KEY)
Use model='sonar-pro' and read resp.citations for source URLs

Intro

Python (openai-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://api.perplexity.ai",
    api_key=os.environ["PPLX_API_KEY"],
)

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What are the top 3 AI funding rounds this week?"}],
)
print(resp.choices[0].message.content)
# Response includes inline citations like [1][2][3]

# Read citation URLs separately
print(resp.citations)   # ["https://...", "https://...", "https://..."]

Filter sources by domain or recency

resp = client.chat.completions.create(
    model="sonar-pro",
    messages=[{"role": "user", "content": "What's the latest Anthropic announcement?"}],
    extra_body={
        "search_domain_filter": ["anthropic.com", "techcrunch.com"],   # whitelist
        "search_recency_filter": "week",                                # day | week | month | year
        "return_images": False,
        "return_related_questions": True,
    },
)

Model tiers (May 2026)

Model	Use case	Cost ($/1M)	Latency
`sonar`	Quick lookups, single-source Q&A	$1 in / $1 out	~1–3s
`sonar-pro`	Production answer quality, multi-source	$3 in / $15 out	~3–7s
`sonar-reasoning`	Hard reasoning, citations + thinking	$1 in / $5 out	~10–25s
`sonar-reasoning-pro`	Top quality reasoning	$2 in / $8 out	~15–40s
`sonar-deep-research`	Long research reports with 30+ sources	$2 in / $8 out + per-search	~minutes

When NOT to use Sonar

If your data is private, not on the web, or in your own corpus — use a private RAG pipeline (e.g., Tavily + your vector store). Sonar searches public web only.

FAQ

Source & Thanks

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

🙏

Fuente y agradecimientos

Built by Perplexity. Sonar API docs at docs.perplexity.ai.

Official SDK pending; OpenAI-compatible client works today.

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

Statewave — Memory Runtime for AI Agents (API + SDKs)

Statewave is a self-hostable memory runtime: ingest episodes, compile memories, do semantic search, and build token-bounded context bundles via REST.

Knowledge

AI Open Source

Helicone Cache — Cut LLM Spend with Drop-In Response Caching

Helicone Cache short-circuits identical LLM requests at the proxy. Set Helicone-Cache-Enabled header, exact-match responses come back in ms at zero cost.

Knowledge

Helicone

LLM Prompt Caching — Cache-Key Design Runbook

LLM prompt caching techniques for agents and apps. Covers stable prefixes, cache keys, TTLs, metrics, and cached-output validation.

Knowledge

henuwangkai

Helicone Sessions — Group LLM Calls by User Conversation

Helicone Sessions group multiple LLM calls under one session ID. Trace a multi-step agent run end-to-end, see total cost, latency, conversation flow.

Knowledge

Helicone