MCP Search + RAG Tools
Ten picks for an AI agent that needs to search the live web and your own docs. Tavily MCP first for AI-shaped web search, Exa and Firecrawl as alternate backends, omnisearch to multiplex providers, Perplexity Sonar plus its citations endpoint for grounded answers, Qdrant and haiku.rag for vector-backed personal-doc RAG, lnav for local log search, and Ragas to score whether the whole stack is actually grounded. Install in order.
What this pack actually solves
A developer wiring search and RAG into an AI agent runs into the same wall every time. Each web-search API has its own response shape. Each vector store has its own embedding contract. Each citation system disagrees about what counts as a source. And nobody can tell you whether the answer the LLM produced was actually grounded in the snippets it was handed.
This pack picks ten servers and APIs that together cover the full pipeline — live web search → multi-provider routing → grounded answers with citations → private-doc retrieval → local log search → grounding evaluation — and orders them so each install unlocks the next layer. By the end your agent can search the live web, retrieve from your own corpus, and emit answers you can audit.
Install in this order
- Tavily MCP / Tavily Search API — start here. The response shape is built for LLMs: snippets, citation metadata, an optional generated answer in one call. Compared to a general-purpose web search wrapper, you skip the parse-the-SERP step. The Tavily docs publish the request and response schemas and a free tier you can wire up the same hour.
- Exa MCP Server (remote) — the second web-search MCP you install, not the first. Exa is an embeddings-native search index — it returns results that are semantically close to the query rather than keyword-matched. Put it behind the same tool name as Tavily and let the agent pick: keyword-style queries to Tavily, exploratory/concept queries to Exa.
- Firecrawl MCP — the search-and-scrape combo. Where Tavily and Exa return snippets, Firecrawl pulls the full cleaned page as markdown. Use it when a snippet is not enough — long-form articles, docs sites, anything you intend to feed back into the LLM context window. The MCP exposes both
searchandscrapeso the agent can decide per call. - mcp-omnisearch — Unified Search MCP Server — once you have two or three providers wired up, omnisearch is the multiplexer. One tool name, many backends. The agent picks the provider by hint or you set a default; failures cascade to the next backend. Install it after you have at least two providers, otherwise it is overhead with nothing to multiplex.
- Perplexity Sonar API — Search-Grounded LLM — different shape from Tavily/Exa. You send a question, you get back an answer with citations baked in. The grounding work happens server-side. Use it when the agent's job is to answer, not to retrieve. Pair it with the next item to render sources.
- Perplexity Citations — Render Source Footnotes — the rendering half. Sonar returns citation arrays; this endpoint produces clean source footnotes you can drop into the agent's reply or a Markdown UI. Install both as a pair, not separately. According to the Perplexity API reference each citation is a structured object with URL, title, and snippet — render it, don't paraphrase it.
- Qdrant MCP — Vector Search Engine — the private-doc layer. Index your docs, code, runbooks, transcripts as embeddings; the MCP exposes a
qdrant-findandqdrant-storetool the agent calls. Qdrant is the standard choice because it has a stable MCP wrapper and a permissive Apache 2.0 license. Start single-node Docker, scale only if you outgrow it. - haiku.rag — Agentic RAG CLI + MCP Server — the second RAG MCP you install. Where Qdrant MCP is a thin vector wrapper, haiku.rag is the opinionated RAG pipeline — chunking, retrieval, optional reranking, citation surfaces — exposed both as a CLI you can run locally and an MCP your agent calls. Pick this when you do not want to build a pipeline yourself; pick raw Qdrant MCP when you do.
- lnav — The Logfile Navigator — the local-search escape hatch. Web search and vector RAG cover the content tier; lnav covers the operational tier. The agent shells out to lnav, runs SQL over your local logs, returns timestamped matches. According to the lnav docs it indexes common log formats automatically — agents pick it up with no extra prompting.
- Ragas — Evaluate RAG & LLM Applications — closes the loop. After the pipeline is wired, Ragas scores answer faithfulness, context precision, and context recall against your eval set. This is how you discover that omnisearch routed the wrong provider, or that Qdrant returned semantically-close-but-wrong chunks, or that the LLM paraphrased the source into something the source did not actually say.
How they fit together
[ Live web tier ] [ Private corpus tier ] [ Local ops tier ]
Tavily MCP Qdrant MCP lnav (SQL on logs)
Exa MCP haiku.rag
Firecrawl MCP │
│ │
└─── mcp-omnisearch ──┐ │
│ │
Perplexity Sonar API │
│ │
Perplexity Citations │
│ │
└──── agent answer ────► Ragas eval set
The spine is Tavily MCP + Perplexity Sonar + Citations + Qdrant MCP + Ragas — that quintet handles 80% of search-and-RAG work for a single-developer agent. Exa MCP and Firecrawl MCP are alternate web backends you swap in for semantic search or full-page scrape. omnisearch is the router once you have a menu. haiku.rag is the opinionated RAG branch when you do not want to assemble a pipeline yourself. lnav is the local-ops branch when the question is about logs, not docs.
Web search vs RAG vs grounded answer — pick the right tier
- Web search MCPs (Tavily, Exa, Firecrawl) — the agent gets snippets or full pages, then decides what to do. Best for exploration, fact-finding, link discovery. Worst when the user wants an answer and not a research report.
- Grounded-answer APIs (Perplexity Sonar + Citations) — the API does retrieval and synthesis server-side and returns one answer plus sources. Best for chatbots and copilots. Worst when you need control over which sources got used.
- Vector RAG (Qdrant MCP, haiku.rag) — the agent retrieves from your corpus. Best for private docs, internal runbooks, code archaeology. Worst for anything that changes faster than your re-indexing schedule.
- Local search (lnav) — SQL over local logs. Best for ops questions. Worst for content questions; do not use it as a general retrieval tier.
Most real agents mix at least two of these tiers. The mistake people make is picking one — usually vector RAG — and trying to make it do everything.
Citations and grounding — do not skip the eval step
The whole point of stacking search and RAG behind an agent is that you can prove where each claim came from. That promise only holds if you actually measure it. Ragas computes three metrics that matter for this pack:
- Faithfulness — does the answer follow from the retrieved context, or did the LLM invent something? This is the metric that catches "the model paraphrased the source into a claim the source does not make."
- Context precision — of the chunks retrieved, how many were relevant? Low precision means your vector store is over-retrieving and the LLM is being asked to filter noise it should not see.
- Context recall — of the chunks that should have been retrieved, how many were? Low recall means your chunking or embedding is missing the answer entirely.
Wire Ragas on day one with a tiny eval set — even five questions with hand-labeled correct contexts is enough to catch regressions when you swap providers behind omnisearch or re-tune your chunker.
Common pitfalls
- Installing two web-search MCPs without routing logic — the agent will pick one randomly per call. Either set a default in the system prompt or install omnisearch before the second backend.
- Treating Perplexity Sonar as a search API — Sonar already synthesizes. If you wanted snippets to feed to your own LLM, you wanted Tavily or Exa. Sonar's value is the grounded answer + citations together.
- Qdrant without a re-indexing job — vector RAG silently rots. The corpus drifts, your embeddings stay stale, and the agent confidently retrieves answers that are six months out of date. Schedule re-indexing on day one.
- haiku.rag and Qdrant MCP both wired up — they overlap. Pick one as the canonical RAG path; the other can stay disabled. Two RAG layers is worse than one because the agent does not know which to trust.
- Citations rendered as paraphrases — if you let the LLM rewrite the citation text, you have broken the audit trail. Render the Perplexity Citations payload verbatim with a link, not as prose.
- Skipping Ragas — without an eval set, every "the answer looks right" gut-check is a regression waiting to happen the next time you swap a backend.
10 assets in this pack
Frequently asked questions
Which MCP do I install first if I only have time for one?
Tavily MCP. Its response shape is the closest to what an LLM can use directly — snippets, citation metadata, an optional generated answer in one call. Every other web-search MCP in this pack is an optimization on top: Exa for semantic search, Firecrawl for full-page scrape, omnisearch for routing. If you skip Tavily and start with raw Brave or a generic search wrapper, the agent spends tokens parsing a SERP-shaped response that was never built for it.
What is the difference between Perplexity Sonar and Tavily — both return search results, right?
They sit at different tiers. Tavily returns snippets and lets your LLM synthesize the answer. Perplexity Sonar runs the retrieval and the synthesis server-side and returns one answer plus structured citations. Use Tavily when you want to control which sources go into the model's context window. Use Sonar when you want a shipped answer with sources baked in. If you stack them — Tavily first to gather candidates, Sonar for the synthesis — you are paying twice and confusing the audit trail. Pick one tier per call.
Do I need both Qdrant MCP and haiku.rag?
No, and wiring both usually hurts. Qdrant MCP is a thin vector wrapper — you bring your own chunking and retrieval logic. haiku.rag is an opinionated RAG pipeline that handles chunking, retrieval, and citation surfaces for you, exposed both as a CLI and an MCP. Pick Qdrant MCP if you have an existing pipeline and want a stable MCP shim in front of it. Pick haiku.rag if you do not want to build a pipeline at all. If you install both, set one as the canonical retrieval path in the system prompt so the agent does not arbitrate between two RAG layers.
How do I make sure the agent's citations are real and not paraphrased?
Three layers. (1) Render the Perplexity Citations payload verbatim — URL, title, snippet — never let the LLM rewrite the citation text. (2) For your own RAG layer, return the chunk text and its document ID in the tool response; instruct the agent to quote, not summarize, when it cites. (3) Run Ragas faithfulness on a small eval set; faithfulness explicitly catches the case where the answer drifts from the retrieved context. The third step is what catches regressions when you swap backends behind omnisearch or re-tune a chunker.
Will any of this work offline or in air-gapped environments?
Partially. Tavily, Exa, Firecrawl, and Perplexity Sonar are hosted APIs and require network egress. Qdrant runs locally — single-node Docker is enough for most personal-doc corpora. haiku.rag runs locally. lnav is fully local. Ragas can run against a local LLM judge if you wire one in. So the air-gapped subset is Qdrant + haiku.rag + lnav + Ragas-with-local-judge. The web search and grounded-answer tiers do not have offline equivalents in this pack — you would need a self-hosted search index like Hister or Elasticsearch MCP as a substitute, and you give up the AI-shaped response format.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs