Legacy Code Onboarding Kit — Land in a 10-Year Codebase
You joined a team. The repo is 10 years old, 800k LOC, three dead microservices, one heroic engineer who left. Ten picks that turn an AI agent into your onboarding partner: index the repo, map the architecture, generate call graphs, summarize modules, flag dead code, then write the CLAUDE.md/AGENTS.md the team never had. Codebase Memory MCP + Codebase Explorer + Pattern Finder + ast-index + CodeGraphContext + Graphify + Legacy Modernizer + Technical Debt Manager + CLAUDE.md template + Architect Reviewer.
Who this is for
You just joined a team. The repo is 8 to 12 years old, hundreds of thousands of lines, three half-dead microservices, conflicting naming conventions, and the one engineer who knew it all left two months ago. There is no ARCHITECTURE.md. The README still references Python 2. The wiki is on a Confluence instance nobody can find the URL for.
This is the situation a curated AI toolchain actually solves. Not by reading the code for you — by giving you a navigable map fast enough that week one becomes productive instead of demoralising.
This pack is opinionated about order. Each tool in the chain produces an artifact the next tool consumes. If you install them at random you get noise; if you install them in this sequence you get a working mental model by Friday.
Install in this order
- Codebase Memory MCP — index first. Persistent code knowledge graph that an MCP-aware agent (Claude Code, Cursor, Windsurf) can query directly. Run it on day one so every later question is answered against a real index instead of grep guesses.
- Claude Code Agent: Codebase Explorer — broad map. Walks the repo and produces a high-level module/service inventory: where the entry points are, which packages call which, what the test layout looks like.
- Claude Code Agent: Codebase Pattern Finder — recurring patterns. Surfaces repeated idioms, custom abstractions, and the tribal-knowledge wrappers that veterans built on top of the standard library. This is where you learn the team's local dialect.
- ast-index — fast structural search. Tree-sitter-backed CLI for AST-level queries (
find every place that calls X with arg shape Y). Replaces ripgrep when you need structure, not text. - CodeGraphContext — call graph as MCP. Builds a queryable graph of function and class relationships. Now the agent can answer what calls this? and what breaks if I change this signature? with edges, not vibes.
- Graphify — repo-wide dependency graph. Layers module-to-module and file-to-file edges on top of the call graph. The diagram you wish someone had drawn five years ago, generated in five minutes.
- Claude Code Agent: Legacy Modernizer — module summaries. Reads each module against the graphs above and writes a paragraph: what it does, who calls it, what it depends on, what looks rotten. Save these as
docs/modules/*.md— your team will thank you. - Claude Code Agent: Technical Debt Manager — dead code and debt. Cross-references the call graph with git blame and test coverage to flag unreachable code, untested critical paths, and modules nobody has touched since 2019. Output is a triage list, not a delete script.
- Claude Code CLAUDE.md — Best Practices Template — write the file the team never had. Use the inventory + summaries + debt list to draft a
CLAUDE.md(and parallelAGENTS.mdfor tool-agnostic agents) that tells the next new hire where the bodies are buried. - Claude Code Agent: Architect Reviewer — your first PR. Before you open a real PR, run the architect-reviewer agent on your diff. It catches the convention you didn't know existed, the layer you accidentally crossed, and the public API you broke by renaming.
How the chain produces value
Codebase Memory MCP ──┐
├── shared index
ast-index ─────────────┤
│
Codebase Explorer ─────┤───► module/service inventory
Pattern Finder ────────┤───► team-specific idioms
│
CodeGraphContext ──────┤───► call edges
Graphify ──────────────┤───► module edges
│
Legacy Modernizer ─────┴───► per-module summaries (docs/modules/*.md)
Technical Debt Manager ────► triage list (docs/debt.md)
│
▼
CLAUDE.md + AGENTS.md ◄── you write this
│
▼
first PR + Architect Reviewer pass
The critical insight: steps 1-6 build structured context the team has been missing. Steps 7-8 turn that context into prose and triage. Step 9 is the writeup that future engineers (and future agents) will rely on. Step 10 keeps your first PR from embarrassing you.
Tradeoffs you'll hit
- Index everything vs index incrementally — Codebase Memory MCP and CodeGraphContext both want a full index up front. On a 1M-line repo this can take 30-90 minutes the first time. Run it overnight; the rest of the week is sub-second queries.
- MCP server vs CLI — ast-index works as a plain CLI; CodeGraphContext and Codebase Memory MCP shine inside an MCP-aware agent. Pick CLI if you're stuck on a tool that doesn't speak MCP; pick MCP if your agent does.
- Auto-generated docs vs handwritten — the per-module summaries the Legacy Modernizer agent produces are a first draft, not a final document. Treat them as starting material you edit, not as authoritative.
- Dead code: flag vs delete — never let an agent delete dead code on a legacy repo. It will be wrong about cron-scheduled callers, dynamic dispatch, and reflection. Flag, review with a veteran, delete by hand.
- CLAUDE.md vs AGENTS.md — Claude Code reads
CLAUDE.md, generic agents readAGENTS.md. Keep them as the same content with two filenames (symlink or generate from one source) so multi-tool teams don't fork.
Common pitfalls
- Skipping the index step — every later tool degrades to grep-quality output without it. Don't.
- Trusting the call graph 100% — reflection, dependency injection, and YAML-configured routes are invisible to static analysis. Verify hot paths with a runtime trace before refactoring.
- Writing CLAUDE.md from agent output alone — talk to a veteran for 30 minutes before finalising. They will name three landmines no graph can see.
- Modernising before you understand — Legacy Modernizer is named provocatively. Use it to summarise first, modernise much later. Week-one rewrites are how new hires get reverted.
- Skipping Architect Reviewer on the first PR — first impressions matter. A 10-minute reviewer pass catches the convention violation that would otherwise eat your code-review thread for three days.
10 assets in this pack
Frequently asked questions
Won't an AI agent just hallucinate the architecture?
It will if you let it answer from raw file context. That's exactly why the first three picks (Codebase Memory MCP, Codebase Explorer, Pattern Finder) exist: they build a structured index the agent queries instead of guessing. Hallucination drops sharply when the answer is grounded in a real graph rather than a 200k-token window of best-guess source files. Verify the first few answers against the code by hand; once they line up, trust grows reasonably.
How long does indexing a million-line repo actually take?
Plan 30 to 90 minutes the first time for Codebase Memory MCP and CodeGraphContext on a mid-size laptop. ast-index is much faster, usually under five minutes. Run the heavy indexers overnight on day one; subsequent queries are sub-second. Incremental reindex on file save is supported by all three and adds only milliseconds per change.
Can I really write a useful CLAUDE.md in week one?
Yes, if you treat the output as a draft and pair it with one veteran review. The per-module summaries, debt triage, and team-idiom list give you a structured first draft most teams have never had. Spend 30 minutes with a long-tenured engineer marking what's wrong, then ship version 0.1 of CLAUDE.md and AGENTS.md. Iterate weekly. The team will start contributing once they see something concrete to correct.
Why both CLAUDE.md and AGENTS.md — isn't one enough?
Claude Code reads CLAUDE.md by convention; tool-agnostic agents (Codex, Cursor, OpenCode, generic agent SDKs) increasingly look for AGENTS.md. Keeping both with the same content covers every agent the team might use in the next two years. Symlink them or generate from a single source-of-truth file to avoid drift.
What if my repo is private and I can't send code to a cloud LLM?
All ten picks support local-first or self-hosted deployment. Codebase Memory MCP, CodeGraphContext, Graphify, and ast-index run entirely on your machine. The Claude Code subagents and CLAUDE.md template work with any model the agent can talk to, including local Ollama or vLLM endpoints. You can complete the entire pack without a single byte of code leaving the laptop.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs