Multi-Agent Frameworks
CAMEL, LangGraph, DeepAgents, GPT Researcher — frameworks for orchestrating teams of agents in production.
What's in this pack
This pack collects the seven multi-agent frameworks that teams actually ship to production in 2026, not the demos that look good on Twitter and explode under load. Four are headline frameworks, three are research/role-play templates that wrap them.
| # | Asset | Type | Best for |
|---|---|---|---|
| 1 | LangGraph | stateful framework | Production graph orchestration with checkpointing |
| 2 | CAMEL | role-play framework | Agent-to-agent dialogue, academic-grade |
| 3 | DeepAgents | research framework | Long-running planning + sub-agent spawning |
| 4 | GPT Researcher | applied agent | Topic in, research report out |
| 5 | Researcher swarm | template | CAMEL roles for parallel research |
| 6 | Critic-actor pair | template | One agent acts, one critiques — error correction |
| 7 | Hierarchical planner | template | Manager-spawns-workers pattern with budget |
Why this pack matters
A single agent is a chat loop. Multi-agent is a system — and like every system, it needs structure (state machines, queues, retries) before it survives a real workload. The four frameworks here picked the structures that work. The three templates show you how to wire them together for the most common use cases.
The frameworks each pick a different abstraction:
- LangGraph treats orchestration as a state graph. You declare nodes (agents/tools) and edges (when to transition), and LangGraph handles checkpointing so a 30-minute run can resume after a crash. The closest thing to a default standard for production.
- CAMEL focuses on agent-to-agent dialogue with explicit roles. Two agents play "user" and "assistant" or "research lead" and "writer" and converse until a goal is met. Strong on reproducibility and academic benchmarks.
- DeepAgents is built for long-horizon tasks. The top agent plans, delegates sub-tasks to spawned sub-agents, each with their own context window. Designed to avoid the "one giant context" failure mode.
- GPT Researcher is the applied case study. You give it a research question, it runs a sub-agent swarm to gather evidence and produces a long-form report with citations. Useful as both a tool and a reference architecture.
Install in one command
# Install the entire pack
tokrepo install pack/multi-agent-frameworks
# Or install one at a time
tokrepo install langgraph
tokrepo install camel
tokrepo install deepagents
tokrepo install gpt-researcher
The TokRepo CLI installs each framework's adapter into your AI tool — Claude Code subagents into .claude/agents/, Cursor rules into .cursor/rules/, AGENTS.md entries for Codex CLI. Run pip / npm for the underlying libraries; TokRepo wires the prompts so your CLI knows when to invoke them.
Common pitfalls
- Don't skip the budget. Multi-agent runs can fan out exponentially — one planner spawning five workers each spawning five sub-tasks burns 25× the tokens. Always cap depth and max-spawn count. DeepAgents bakes this in; with LangGraph and CAMEL you set it yourself.
- Don't share an LLM client across threads naively. Most SDKs aren't fully thread-safe under high concurrency. Use process-level pools or async with bounded concurrency (e.g. asyncio.Semaphore(8)).
- Trace everything. Multi-agent debugging without traces is impossible. Pair this pack with the LLM Observability pack — Langfuse and AgentOps both have first-class LangGraph integrations.
- Beware role drift. In CAMEL-style dialogue, agents sometimes forget who they are around turn 8-10. Add a system reminder every N turns or pin the role in every message.
- Multi-agent ≠ better. Try a single Claude Sonnet 4.5 with extended thinking before reaching for a multi-agent system. The 2025 Anthropic blog post on multi-agent research found that 60% of tasks people throw at multi-agent setups would do fine with one agent + tools.
When this pack alone isn't enough
Multi-agent shines on tasks with parallelizable subproblems (research, code review, content generation across topics). It loses on:
- Sequential, deeply-stateful tasks. Refactoring a codebase end-to-end is one agent's job — splitting it across multiple agents creates more coordination overhead than it saves.
- Latency-sensitive workflows. Each hop between agents adds a round-trip. If you're under a 5-second SLA, stay single-agent.
- Cost-sensitive workflows. A multi-agent run typically costs 3-10× a single-agent run for the same task. Worth it for quality on hard problems; not worth it for "summarize this email."
The right way to adopt this pack: start with GPT Researcher as the simplest finished example, then graduate to LangGraph or DeepAgents when you need to write your own orchestration.
7 assets in this pack
Frequently asked questions
Is LangGraph free?
Yes, LangGraph is open-source under MIT and you only pay for the LLM tokens. There's a paid LangGraph Cloud for managed deployment with checkpointing and traces, but the OSS library is fully featured. CAMEL, DeepAgents, and GPT Researcher are also OSS — no paid tier is required to ship.
Does this work with Cursor or Codex CLI?
The frameworks are language-level Python libraries, not Claude Code-specific. Any agent CLI that runs Python tools can drive them. The TokRepo CLI installs the right wiring for your tool — for Codex CLI it ships AGENTS.md instructions explaining when to invoke the framework, for Cursor it adds rules. The underlying Python install is unchanged.
How does LangGraph compare to CAMEL?
LangGraph is structure-first: you draw a state machine and the agents fit into it. CAMEL is dialogue-first: you assign roles and let agents converse. LangGraph wins for production reliability and checkpointing; CAMEL wins for research, simulations, and any case where the conversation itself is the artifact. Many production setups use LangGraph for orchestration and call CAMEL for specific dialogue tasks.
What's the difference vs the Memory Layer pack?
Memory is about what an agent remembers between sessions. Multi-agent is about how multiple agents coordinate within one task. They're orthogonal: a multi-agent system often needs a shared memory layer (Mem0/Zep) so the workers don't have to re-discover facts the planner already knew. We recommend installing both packs if you're building anything serious.
When should I NOT use a multi-agent framework?
When the task is sequential and stateful (refactor this file), latency-sensitive (chat UIs under 3s), or simple enough for one Claude/GPT call. Anthropic's own multi-agent research blog notes that single-agent + extended thinking beats most multi-agent setups on cost. Reach for multi-agent when the task naturally parallelizes (research many sources) or requires distinct expert roles.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs