TOKREPO · ARSENAL

Stable

MCP Search + RAG Tools

Ten picks for an AI agent that needs to search the live web and your own docs. Tavily MCP first for AI-shaped web search, Exa and Firecrawl as alternate backends, omnisearch to multiplex providers, Perplexity Sonar plus its citations endpoint for grounded answers, Qdrant and haiku.rag for vector-backed personal-doc RAG, lnav for local log search, and Ragas to score whether the whole stack is actually grounded. Install in order.

10 assets

About this pack

What this pack actually solves

A developer wiring search and RAG into an AI agent runs into the same wall every time. Each web-search API has its own response shape. Each vector store has its own embedding contract. Each citation system disagrees about what counts as a source. And nobody can tell you whether the answer the LLM produced was actually grounded in the snippets it was handed.

This pack picks ten servers and APIs that together cover the full pipeline — live web search → multi-provider routing → grounded answers with citations → private-doc retrieval → local log search → grounding evaluation — and orders them so each install unlocks the next layer. By the end your agent can search the live web, retrieve from your own corpus, and emit answers you can audit.

Install in this order

Tavily MCP / Tavily Search API — start here. The response shape is built for LLMs: snippets, citation metadata, an optional generated answer in one call. Compared to a general-purpose web search wrapper, you skip the parse-the-SERP step. The Tavily docs publish the request and response schemas and a free tier you can wire up the same hour.
Exa MCP Server (remote) — the second web-search MCP you install, not the first. Exa is an embeddings-native search index — it returns results that are semantically close to the query rather than keyword-matched. Put it behind the same tool name as Tavily and let the agent pick: keyword-style queries to Tavily, exploratory/concept queries to Exa.
Firecrawl MCP — the search-and-scrape combo. Where Tavily and Exa return snippets, Firecrawl pulls the full cleaned page as markdown. Use it when a snippet is not enough — long-form articles, docs sites, anything you intend to feed back into the LLM context window. The MCP exposes both search and scrape so the agent can decide per call.
mcp-omnisearch — Unified Search MCP Server — once you have two or three providers wired up, omnisearch is the multiplexer. One tool name, many backends. The agent picks the provider by hint or you set a default; failures cascade to the next backend. Install it after you have at least two providers, otherwise it is overhead with nothing to multiplex.
Perplexity Sonar API — Search-Grounded LLM — different shape from Tavily/Exa. You send a question, you get back an answer with citations baked in. The grounding work happens server-side. Use it when the agent's job is to answer, not to retrieve. Pair it with the next item to render sources.
Perplexity Citations — Render Source Footnotes — the rendering half. Sonar returns citation arrays; this endpoint produces clean source footnotes you can drop into the agent's reply or a Markdown UI. Install both as a pair, not separately. According to the Perplexity API reference each citation is a structured object with URL, title, and snippet — render it, don't paraphrase it.
Qdrant MCP — Vector Search Engine — the private-doc layer. Index your docs, code, runbooks, transcripts as embeddings; the MCP exposes a qdrant-find and qdrant-store tool the agent calls. Qdrant is the standard choice because it has a stable MCP wrapper and a permissive Apache 2.0 license. Start single-node Docker, scale only if you outgrow it.
haiku.rag — Agentic RAG CLI + MCP Server — the second RAG MCP you install. Where Qdrant MCP is a thin vector wrapper, haiku.rag is the opinionated RAG pipeline — chunking, retrieval, optional reranking, citation surfaces — exposed both as a CLI you can run locally and an MCP your agent calls. Pick this when you do not want to build a pipeline yourself; pick raw Qdrant MCP when you do.
lnav — The Logfile Navigator — the local-search escape hatch. Web search and vector RAG cover the content tier; lnav covers the operational tier. The agent shells out to lnav, runs SQL over your local logs, returns timestamped matches. According to the lnav docs it indexes common log formats automatically — agents pick it up with no extra prompting.
Ragas — Evaluate RAG & LLM Applications — closes the loop. After the pipeline is wired, Ragas scores answer faithfulness, context precision, and context recall against your eval set. This is how you discover that omnisearch routed the wrong provider, or that Qdrant returned semantically-close-but-wrong chunks, or that the LLM paraphrased the source into something the source did not actually say.

How they fit together

          [ Live web tier ]              [ Private corpus tier ]      [ Local ops tier ]
          Tavily MCP                     Qdrant MCP                   lnav (SQL on logs)
          Exa MCP                        haiku.rag
          Firecrawl MCP                          │
               │                                 │
               └─── mcp-omnisearch ──┐           │
                                     │           │
                            Perplexity Sonar API │
                                     │           │
                            Perplexity Citations │
                                     │           │
                                     └──── agent answer ────► Ragas eval set

The spine is Tavily MCP + Perplexity Sonar + Citations + Qdrant MCP + Ragas — that quintet handles 80% of search-and-RAG work for a single-developer agent. Exa MCP and Firecrawl MCP are alternate web backends you swap in for semantic search or full-page scrape. omnisearch is the router once you have a menu. haiku.rag is the opinionated RAG branch when you do not want to assemble a pipeline yourself. lnav is the local-ops branch when the question is about logs, not docs.

Web search vs RAG vs grounded answer — pick the right tier

Web search MCPs (Tavily, Exa, Firecrawl) — the agent gets snippets or full pages, then decides what to do. Best for exploration, fact-finding, link discovery. Worst when the user wants an answer and not a research report.
Grounded-answer APIs (Perplexity Sonar + Citations) — the API does retrieval and synthesis server-side and returns one answer plus sources. Best for chatbots and copilots. Worst when you need control over which sources got used.
Vector RAG (Qdrant MCP, haiku.rag) — the agent retrieves from your corpus. Best for private docs, internal runbooks, code archaeology. Worst for anything that changes faster than your re-indexing schedule.
Local search (lnav) — SQL over local logs. Best for ops questions. Worst for content questions; do not use it as a general retrieval tier.

Most real agents mix at least two of these tiers. The mistake people make is picking one — usually vector RAG — and trying to make it do everything.

Citations and grounding — do not skip the eval step

The whole point of stacking search and RAG behind an agent is that you can prove where each claim came from. That promise only holds if you actually measure it. Ragas computes three metrics that matter for this pack:

Faithfulness — does the answer follow from the retrieved context, or did the LLM invent something? This is the metric that catches "the model paraphrased the source into a claim the source does not make."
Context precision — of the chunks retrieved, how many were relevant? Low precision means your vector store is over-retrieving and the LLM is being asked to filter noise it should not see.
Context recall — of the chunks that should have been retrieved, how many were? Low recall means your chunking or embedding is missing the answer entirely.

Wire Ragas on day one with a tiny eval set — even five questions with hand-labeled correct contexts is enough to catch regressions when you swap providers behind omnisearch or re-tune your chunker.

Common pitfalls

Installing two web-search MCPs without routing logic — the agent will pick one randomly per call. Either set a default in the system prompt or install omnisearch before the second backend.
Treating Perplexity Sonar as a search API — Sonar already synthesizes. If you wanted snippets to feed to your own LLM, you wanted Tavily or Exa. Sonar's value is the grounded answer + citations together.
Qdrant without a re-indexing job — vector RAG silently rots. The corpus drifts, your embeddings stay stale, and the agent confidently retrieves answers that are six months out of date. Schedule re-indexing on day one.
haiku.rag and Qdrant MCP both wired up — they overlap. Pick one as the canonical RAG path; the other can stay disabled. Two RAG layers is worse than one because the agent does not know which to trust.
Citations rendered as paraphrases — if you let the LLM rewrite the citation text, you have broken the audit trail. Render the Perplexity Citations payload verbatim with a link, not as prose.
Skipping Ragas — without an eval set, every "the answer looks right" gut-check is a regression waiting to happen the next time you swap a backend.

INSTALL · ONE COMMAND

$ tokrepo install pack/mcp-search-rag-tools

hand it to your agent — or paste it in your terminal

What's inside

10 assets in this pack

Agent#01

Tavily Search — Search API Built for AI Agents

Tavily Search returns LLM-ready answers from the web — not link lists. One call gets snippets, citations, optional generated answer. Free tier 1K/mo.

by Tavily·302 views

$ tokrepo install tavily-search-search-api-built-for-ai-agents

MCP#02

Exa MCP Server — Remote Search Tools for Agents

Exa MCP Server connects clients to Exa’s hosted web/code search tools via a remote MCP URL, with simple config for Cursor, VS Code, Claude Code, and Codex.

by MCP Hub·247 views

$ tokrepo install exa-mcp-server-remote-search-tools-for-agents

MCP#03

Firecrawl MCP — Web Search & Scrape Tools

Add Firecrawl MCP to your agent to search, scrape, and extract full-page content. Run via npx with an API key; fits Cursor, Claude Code, VS Code.

by MCP Hub·182 views

$ tokrepo install firecrawl-mcp-web-search-scrape-tools

MCP#04

mcp-omnisearch — Unified Search MCP Server

Run mcp-omnisearch as an MCP server to unify Tavily, Brave, Kagi, Exa, GitHub search, and extraction tools behind one interface.

by MCP Hub·176 views

$ tokrepo install mcp-omnisearch-unified-search-mcp-server

Skill#05

Perplexity Sonar API — Search-Grounded LLM in One Call

Perplexity Sonar API returns LLM answers grounded in real-time web search with citations. Tiers: sonar / sonar-pro / sonar-reasoning.

by Perplexity·209 views

$ tokrepo install perplexity-sonar-api-search-grounded-llm-in-one-call

Skill#06

Perplexity Citations — Render Source Footnotes in Your UI

Parse Perplexity inline citation markers ([1][2][3]) + the citations URL array into clickable footnote UI. Markdown render, hover preview.

by Perplexity·180 views

$ tokrepo install perplexity-citations-render-source-footnotes-in-your-ui

MCP#07

Qdrant MCP — Vector Search Engine for AI Agents

MCP server for Qdrant vector database. Gives AI agents the power to store and search embeddings for RAG, semantic search, and recommendation systems. 22,000+ stars on Qdrant.

by MCP Hub·354 views

$ tokrepo install qdrant-mcp-vector-search-engine-ai-agents-301ce58e

MCP#08

haiku.rag — Agentic RAG CLI + MCP Server

haiku.rag is an agentic RAG toolkit with CLI, Python API, and MCP server; verified 524★ and supports `add-src`, `ask --cite`, and `serve --mcp`.

by Agent Toolkit·210 views

$ tokrepo install haiku-rag-agentic-rag-cli-mcp-server

Skill#09

lnav — The Logfile Navigator with SQL and Live Tailing

lnav is an advanced log file viewer that understands dozens of log formats, provides SQL queries against log records, live-tails rotating files, and timestamps-merges multiple logs into one view.

by Script Depot·205 views

$ tokrepo install lnav-logfile-navigator-sql-live-tailing-4493f997

Skill#10

Ragas — Evaluate RAG & LLM Applications

Ragas evaluates LLM applications with objective metrics, test data generation, and data-driven insights. 13.2K+ GitHub stars. RAG evaluation, auto test generation. Apache 2.0.

by Script Depot·298 views

$ tokrepo install ragas-evaluate-rag-llm-applications-2c856b4d

FAQ

Frequently asked questions

Which MCP do I install first if I only have time for one?

Tavily MCP. Its response shape is the closest to what an LLM can use directly — snippets, citation metadata, an optional generated answer in one call. Every other web-search MCP in this pack is an optimization on top: Exa for semantic search, Firecrawl for full-page scrape, omnisearch for routing. If you skip Tavily and start with raw Brave or a generic search wrapper, the agent spends tokens parsing a SERP-shaped response that was never built for it.

What is the difference between Perplexity Sonar and Tavily — both return search results, right?

They sit at different tiers. Tavily returns snippets and lets your LLM synthesize the answer. Perplexity Sonar runs the retrieval and the synthesis server-side and returns one answer plus structured citations. Use Tavily when you want to control which sources go into the model's context window. Use Sonar when you want a shipped answer with sources baked in. If you stack them — Tavily first to gather candidates, Sonar for the synthesis — you are paying twice and confusing the audit trail. Pick one tier per call.

Do I need both Qdrant MCP and haiku.rag?

No, and wiring both usually hurts. Qdrant MCP is a thin vector wrapper — you bring your own chunking and retrieval logic. haiku.rag is an opinionated RAG pipeline that handles chunking, retrieval, and citation surfaces for you, exposed both as a CLI and an MCP. Pick Qdrant MCP if you have an existing pipeline and want a stable MCP shim in front of it. Pick haiku.rag if you do not want to build a pipeline at all. If you install both, set one as the canonical retrieval path in the system prompt so the agent does not arbitrate between two RAG layers.

How do I make sure the agent's citations are real and not paraphrased?

Three layers. (1) Render the Perplexity Citations payload verbatim — URL, title, snippet — never let the LLM rewrite the citation text. (2) For your own RAG layer, return the chunk text and its document ID in the tool response; instruct the agent to quote, not summarize, when it cites. (3) Run Ragas faithfulness on a small eval set; faithfulness explicitly catches the case where the answer drifts from the retrieved context. The third step is what catches regressions when you swap backends behind omnisearch or re-tune a chunker.

Will any of this work offline or in air-gapped environments?

Partially. Tavily, Exa, Firecrawl, and Perplexity Sonar are hosted APIs and require network egress. Qdrant runs locally — single-node Docker is enough for most personal-doc corpora. haiku.rag runs locally. lnav is fully local. Ragas can run against a local LLM judge if you wire one in. So the air-gapped subset is Qdrant + haiku.rag + lnav + Ragas-with-local-judge. The web search and grounded-answer tiers do not have offline equivalents in this pack — you would need a self-hosted search index like Hister or Elasticsearch MCP as a substitute, and you give up the AI-shaped response format.

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs