Stack pour Agents avec Planification
Dix picks pour les devs qui construisent des agents plan-puis-exécution (ReAct, ReWOO, Plan-and-Solve) : frameworks de planning, décomposition de tâches, plans persistants, agents avec world model, et le harness d'eval qui prouve que le plan a tenu.
What's in this pack
This is the stack you build when the single-shot ReAct loop hits its first 30-step trajectory, drifts off goal at step 14, and you realize the agent never had a plan — it had momentum.
Every pick is for the dev shipping a plan-then-execute agent: the agent writes a plan, executes against it, then revises. That's the ReAct family (interleaved reason+act), ReWOO (plan everything up-front, no observations during planning), and Plan-and-Solve (decompose then solve each step). The picks cover the four layers that have to exist before any pattern ships: pattern literacy, orchestration, decomposition + persistence, and review / world model / eval.
This pack is deliberately different from the existing multi-agent-frameworks pack on TokRepo. That one is about how multiple agents talk to each other (CAMEL, LangGraph as orchestration, DeepAgents, GPT Researcher). This pack is about how a single agent — or the planner inside a swarm — actually plans. Zero overlapping workflow IDs.
| # | Asset | Role in the planning loop |
|---|---|---|
| 1 | AI Agent Design Patterns | Pattern catalog: ReAct, ReWOO, Plan-and-Solve, Reflexion |
| 2 | Awesome Agentic Patterns | Practical blueprints for planning + tool use |
| 3 | LangGraph | Stateful plan-execute graph with checkpointing |
| 4 | AutoGen | GroupChat-driven planning between specialized agents |
| 5 | Task Decomposition (Claude Code Agent) | Subagent that splits a goal into ordered subtasks |
| 6 | Planning with Files (Manus-style) | Persist plan.md to disk so the agent can resume |
| 7 | Plandex | Reference plan-execute agent for large codebases |
| 8 | Plannotator | Visual review of the plan + the diff it produced |
| 9 | Agentic World Modeling | Research directory on agents that model environments |
| 10 | Agent Evaluation | Test virtual agents in CI — proves the new plan was actually better |
Install in this order (read → orchestrate → decompose → persist → execute → review → eval)
- AI Agent Design Patterns — start here. Know the three families: ReAct interleaves Reason → Act → Observe (cheap, robust, drifts on long horizons); ReWOO plans every tool call up-front then executes (saves tokens, fragile to a bad first plan); Plan-and-Solve decomposes then solves each subtask (the default for long-horizon work).
- Awesome Agentic Patterns — once you know the patterns, you need the recipes. A blueprint directory for planning, tool use, reflection, and recovery. Use it as a lookup when you hit a design question.
- LangGraph — the orchestration framework most plan-execute agents end up on. Plan is a node, executor is a node, revision is a node; LangGraph carries state + checkpoints. Crash at step 17? Resume from the checkpoint. This pick is the workflow-focused entry, distinct from the orchestration entry in the multi-agent pack.
- AutoGen — when planning is a conversation (planner + critic + executor), GroupChat is the cleanest abstraction. Declare agents and a speaker policy; the framework runs the dialogue until the goal is reached. Use it for planning-via-conversation; use LangGraph when the plan is a graph.
- Task Decomposition (Claude Code Agent) — the subagent that does the actual decomposition. Drop it into
.claude/agents/, hand it a goal, get an ordered subtask list with dependencies. Production planners delegate this rather than asking the planner LLM to decompose mid-prompt. - Planning with Files — Manus's signature trick: write the plan to a file the agent re-reads every turn. Ships
plan.md,progress.md, append-only logs. The cheapest robustness upgrade in this pack — the agent literally can't lose track because it's reading from disk, not memory. - Plandex — the reference plan-execute agent you can read the source of. Decompose a coding task, write a plan, execute step by step, ask for review at checkpoints. Run it on your own repo and watch the trajectory; it's the cheapest way to learn what a production planner looks like.
- Plannotator — once the agent has a plan and a diff, a human (or critic agent) reviews them side by side. Visual board: plan steps on one side, diff produced by each step on the other, inline annotation. Plug it between execute and merge.
- Agentic World Modeling — the frontier: agents that maintain an explicit world model (what's in the database, what files exist, what the user already said) so the plan is grounded in state, not hallucination. A curated research catalog — papers, benchmarks, codebases.
- Agent Evaluation — the loop closer. Without a plan eval, every prompt change is vibes. Scores trajectories on plan quality (was the plan optimal?), fidelity (did the agent follow it?), and outcome (was the goal met?). Run on every PR.
How they fit together (plan-then-execute loop)
┌─────────────────────────────────────────────────────────────┐
│ READ FIRST │
│ AI Agent Design Patterns ──► ReAct / ReWOO / Plan-Solve │
│ Awesome Agentic Patterns ──► named blueprints │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ ORCHESTRATE → DECOMPOSE → PERSIST │
│ LangGraph (graph plan) OR AutoGen (dialogue plan) │
│ │ │
│ ▼ │
│ Task Decomposition agent ──► ordered subtask list │
│ │ │
│ ▼ │
│ Planning with Files ──► plan.md / progress.md │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ EXECUTE → REVIEW → REPLAN │
│ Plandex-style loop ──► step → tool → diff │
│ Plannotator ──► human/critic reviews diff │
│ Agentic World Modeling ──► update state, replan if off │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ EVAL │
│ Agent Evaluation ◄── plan quality + fidelity + outcome │
│ (runs on every PR in CI) │
└─────────────────────────────────────────────────────────────┘
Read is one-time. Orchestrate / decompose / persist is setup — done once per agent. Execute / review / world model is the hot loop per task. Eval wraps everything and is the only feedback signal that survives prompt drift, model swaps, and library upgrades.
Tradeoffs you'll hit
- ReAct vs ReWOO vs Plan-and-Solve. ReAct is the easy default but burns tokens on long horizons. ReWOO front-loads the whole plan in one call (cheap, fast) but is brittle when the first observation invalidates it. Plan-and-Solve is the middle ground used by Manus, Plandex, and most production planners: decompose once, execute with light revision. Start with Plan-and-Solve unless you have a reason not to.
- LangGraph vs AutoGen for planning. LangGraph treats the plan as a graph — state machine, checkpoints, deterministic pipelines. AutoGen treats it as a conversation — planner talks to executor talks to critic. Many mature setups use LangGraph as the outer graph and embed an AutoGen GroupChat inside one planner node.
- Plan in prompt vs plan on disk. In-context is fast but you lose it on restart and risk context bloat after step 10. Writing
plan.md/progress.mdto disk costs one tool call per turn but makes resumption, audit, and human override trivial. Default to disk for anything that runs more than ~10 minutes. - Single planner LLM vs decomposition subagent. One model doing decompose + execute + revise saves a hop but degrades on long lists — models under-decompose to look efficient. A dedicated subagent (own system prompt, lower temperature) produces flatter, more parallelizable lists. Use it when tasks routinely exceed ~5 subtasks.
- Plan eval vs outcome-only eval. Outcome-only misses agents that stumbled into the right answer despite a terrible plan — they regress next month. Plan-only misses agents that wrote a beautiful plan and ignored it. Score both, separately. The CI harness in this pack does both.
Common pitfalls
- No replanning trigger. A plan written at t=0 will be wrong by step 12 because the world changed. Build an explicit trigger —
if last_step_failed or world_state_diff > threshold then revise_plan. Without it, agents march down dead trajectories until the budget runs out. - Decomposition runaway. Naive decomposers turn one task into 40 subtasks, each into 40 more. Cap recursion depth, cap subtask count, and pin "prefer 3–7 subtasks at the top level" into the system prompt.
- No plan persistence. Crashing mid-run is not rare. Plan-only-in-context means starting over. Write it to a file from minute one.
- Plan + execute on the same model with no critic. Planner biases toward plans the executor can do; executor biases toward executing whatever the planner says. Add a separate critic (different system prompt, ideally different model) whose only job is to flag bad plans before execution. AutoGen's GroupChat makes this trivial.
- Evaling the LLM not the trajectory. Easy to test "does the LLM output a plan that looks good" and miss that the agent never followed it. Eval the trajectory: did the executed actions match the plan? When they diverged, was it justified?
- Confusing planning with multi-agent. Many teams reach for a swarm when they needed a single planner with a good decomposer. Try plan-then-execute with one model first. Add a second agent only when you can name a specific role it plays (critic, world-model keeper, tool specialist).
When this pack alone isn't enough
Planning agents shine on long-horizon, multi-step, recoverable tasks. They lose on short latency-sensitive interactions (just ReAct it), tasks without a clear success criterion (build the eval first), and tasks where the executor is itself a swarm (install the multi-agent-frameworks pack alongside). The right adoption path: read the patterns guide, build the smallest plan-execute loop in LangGraph with file-persisted plans, run Plandex on a real repo to see what "good" looks like, then bolt on the eval harness before you touch the prompt a second time.
10 ressources prêtes à installer
Questions fréquentes
How is this pack different from the existing `multi-agent-frameworks` pack on TokRepo?
Different layer of the stack. multi-agent-frameworks is about how multiple agents coordinate within one task — CAMEL roles, LangGraph as a team orchestrator, DeepAgents spawning sub-agents, GPT Researcher as a swarm reference. This pack is about how a single agent plans before it acts — pattern catalogs (ReAct/ReWOO/Plan-and-Solve), task decomposition, plan persistence to disk, plan review, and agent eval. Zero overlapping workflow IDs. They pair well: install this pack for the planner, install multi-agent-frameworks if the executor is a team of agents.
Should I use LangGraph or AutoGen for planning?
LangGraph for plans that are graphs you can draw — deterministic pipelines, long-running jobs that need checkpointing, anything you want to resume after a crash. AutoGen for plans that emerge from a conversation between specialized agents — planner + critic + executor talking until consensus. Many production setups use LangGraph as the outer state machine and embed an AutoGen GroupChat inside one planner node when they need conversational planning for that step. Start with LangGraph; reach for AutoGen when the planning step itself benefits from multi-turn dialogue.
Is Manus-style plan-on-disk really worth the tool calls?
Yes for anything that runs more than about ten minutes. Two real wins: resumption (crash recovery is reading a file, not replaying a trajectory) and auditability (humans can read the plan and the progress log without parsing LLM context). The cost is one or two tool calls per turn to read and write plan.md / progress.md, which is rounding error compared to the LLM call itself. For short interactive sessions it's overkill; for any long-horizon agentic loop it's the single highest-ROI reliability upgrade in this pack.
Why include both LangGraph here and in the multi-agent-frameworks pack?
Different workflow entries pointing at different facets of the same library. The multi-agent pack uses the orchestration-focused entry — LangGraph as the team coordinator. This pack uses the stateful-workflows entry — LangGraph as the plan-execute graph carrying one agent through many steps. Same library, different blueprints. You install LangGraph once; the two packs teach you two different ways to use it. Workflow IDs are distinct so there's no double-counting in your installed list.
What's the smallest viable agent-eval set I can start with?
Twenty to thirty hand-written trajectories is enough to detect regressions meaningfully. For each, record the goal, the ideal plan (your hand-written reference), the executed trajectory, and the outcome. Score three things separately: plan quality vs reference, fidelity (did execution follow the plan?), and outcome (was the goal met?). Build it in the first week before you tune the planner prompt. Every real production failure becomes one more case. By month three you'll have 100+ trajectories and every prompt change runs them in CI — that's the loop that drives planner quality, not vibes.
12 packs · 80+ ressources sélectionnées
Découvrez tous les packs curatés sur la page d'accueil
Retour à tous les packs