Introduction
MindSearch is an open-source AI-powered search engine developed by the InternLM team at Shanghai AI Lab. It uses a multi-agent system to break down complex questions into sub-queries, search the web in parallel, and produce well-sourced, comprehensive answers similar to how a human researcher would work.
What MindSearch Does
- Decomposes complex user questions into a directed acyclic graph (DAG) of sub-queries
- Dispatches multiple search agents in parallel to gather information for each sub-query
- Aggregates and synthesizes results from dozens of web pages into a coherent answer
- Provides source citations and a visual graph of the reasoning process
- Supports multiple LLM backends and search engines including SearXNG and Bing
Architecture Overview
MindSearch consists of two core agent roles: a Planner and multiple Searchers. The Planner receives the user query and generates a DAG of atomic sub-questions with dependency edges. Independent sub-questions are dispatched to Searcher agents in parallel. Each Searcher calls a web search API, reads the returned pages, and extracts relevant information. Once all sub-tasks complete, the Planner synthesizes the individual findings into a final answer with citations. The system is built on the InternLM agent framework and uses a React-based frontend for the visual interface.
Self-Hosting & Configuration
- Requires Python 3.9+ and a running SearXNG instance or Bing API key for web search
- Configure the LLM backend via environment variables (supports InternLM, OpenAI, and compatible APIs)
- Deploy SearXNG locally with Docker for a fully self-hosted, privacy-respecting setup
- Adjust the maximum number of parallel searcher agents and search depth in the config
- The frontend can be run separately with Node.js or served through the included Gradio/Streamlit interface
Key Features
- Graph-based query decomposition that handles multi-hop, comparative, and aggregation questions
- Parallel search execution dramatically reduces latency compared to sequential approaches
- Visual reasoning graph showing how the answer was assembled from sub-queries
- Model-agnostic design works with both commercial and open-weight LLMs
- Self-hostable with local search via SearXNG for complete data privacy
Comparison with Similar Tools
- Perplexica — single-agent search-and-answer; MindSearch uses multi-agent parallel decomposition for complex queries
- SearchGPT — closed commercial product; MindSearch is fully open source and self-hostable
- SearXNG — privacy metasearch engine without AI synthesis; MindSearch adds LLM-based reasoning on top
- Tavily — search API for AI agents; MindSearch is a full search application, not just an API
- Stanford STORM — research report generation; MindSearch focuses on real-time interactive question answering
FAQ
Q: Can I use MindSearch without an internet connection? A: No. MindSearch requires live web search to retrieve current information. However, the search and LLM components can all be self-hosted on your network.
Q: Which search backends are supported? A: SearXNG (recommended for self-hosting), Bing Web Search API, and DuckDuckGo.
Q: How many web pages does it process per query? A: Depending on query complexity, MindSearch typically reads and synthesizes content from 20 to 60 web pages per question.
Q: Can I use GPT-4 or Claude as the reasoning model? A: Yes. Any OpenAI-compatible API can be configured as the LLM backend.