TOKREPO · ARSENAL
New · this week

Legacy Code Onboarding Kit — Land in a 10-Year Codebase

You joined a team. The repo is 10 years old, 800k LOC, three dead microservices, one heroic engineer who left. Ten picks that turn an AI agent into your onboarding partner: index the repo, map the architecture, generate call graphs, summarize modules, flag dead code, then write the CLAUDE.md/AGENTS.md the team never had. Codebase Memory MCP + Codebase Explorer + Pattern Finder + ast-index + CodeGraphContext + Graphify + Legacy Modernizer + Technical Debt Manager + CLAUDE.md template + Architect Reviewer.

10 assets

Who this is for

You just joined a team. The repo is 8 to 12 years old, hundreds of thousands of lines, three half-dead microservices, conflicting naming conventions, and the one engineer who knew it all left two months ago. There is no ARCHITECTURE.md. The README still references Python 2. The wiki is on a Confluence instance nobody can find the URL for.

This is the situation a curated AI toolchain actually solves. Not by reading the code for you — by giving you a navigable map fast enough that week one becomes productive instead of demoralising.

This pack is opinionated about order. Each tool in the chain produces an artifact the next tool consumes. If you install them at random you get noise; if you install them in this sequence you get a working mental model by Friday.

Install in this order

  1. Codebase Memory MCP — index first. Persistent code knowledge graph that an MCP-aware agent (Claude Code, Cursor, Windsurf) can query directly. Run it on day one so every later question is answered against a real index instead of grep guesses.
  2. Claude Code Agent: Codebase Explorer — broad map. Walks the repo and produces a high-level module/service inventory: where the entry points are, which packages call which, what the test layout looks like.
  3. Claude Code Agent: Codebase Pattern Finder — recurring patterns. Surfaces repeated idioms, custom abstractions, and the tribal-knowledge wrappers that veterans built on top of the standard library. This is where you learn the team's local dialect.
  4. ast-index — fast structural search. Tree-sitter-backed CLI for AST-level queries (find every place that calls X with arg shape Y). Replaces ripgrep when you need structure, not text.
  5. CodeGraphContext — call graph as MCP. Builds a queryable graph of function and class relationships. Now the agent can answer what calls this? and what breaks if I change this signature? with edges, not vibes.
  6. Graphify — repo-wide dependency graph. Layers module-to-module and file-to-file edges on top of the call graph. The diagram you wish someone had drawn five years ago, generated in five minutes.
  7. Claude Code Agent: Legacy Modernizer — module summaries. Reads each module against the graphs above and writes a paragraph: what it does, who calls it, what it depends on, what looks rotten. Save these as docs/modules/*.md — your team will thank you.
  8. Claude Code Agent: Technical Debt Manager — dead code and debt. Cross-references the call graph with git blame and test coverage to flag unreachable code, untested critical paths, and modules nobody has touched since 2019. Output is a triage list, not a delete script.
  9. Claude Code CLAUDE.md — Best Practices Template — write the file the team never had. Use the inventory + summaries + debt list to draft a CLAUDE.md (and parallel AGENTS.md for tool-agnostic agents) that tells the next new hire where the bodies are buried.
  10. Claude Code Agent: Architect Reviewer — your first PR. Before you open a real PR, run the architect-reviewer agent on your diff. It catches the convention you didn't know existed, the layer you accidentally crossed, and the public API you broke by renaming.

How the chain produces value

Codebase Memory MCP  ──┐
                       ├── shared index
ast-index ─────────────┤
                       │
Codebase Explorer ─────┤───►  module/service inventory
Pattern Finder ────────┤───►  team-specific idioms
                       │
CodeGraphContext ──────┤───►  call edges
Graphify ──────────────┤───►  module edges
                       │
Legacy Modernizer ─────┴───►  per-module summaries (docs/modules/*.md)
Technical Debt Manager ────►  triage list (docs/debt.md)
                            │
                            ▼
        CLAUDE.md  +  AGENTS.md  ◄── you write this
                            │
                            ▼
         first PR + Architect Reviewer pass

The critical insight: steps 1-6 build structured context the team has been missing. Steps 7-8 turn that context into prose and triage. Step 9 is the writeup that future engineers (and future agents) will rely on. Step 10 keeps your first PR from embarrassing you.

Tradeoffs you'll hit

  • Index everything vs index incrementally — Codebase Memory MCP and CodeGraphContext both want a full index up front. On a 1M-line repo this can take 30-90 minutes the first time. Run it overnight; the rest of the week is sub-second queries.
  • MCP server vs CLI — ast-index works as a plain CLI; CodeGraphContext and Codebase Memory MCP shine inside an MCP-aware agent. Pick CLI if you're stuck on a tool that doesn't speak MCP; pick MCP if your agent does.
  • Auto-generated docs vs handwritten — the per-module summaries the Legacy Modernizer agent produces are a first draft, not a final document. Treat them as starting material you edit, not as authoritative.
  • Dead code: flag vs delete — never let an agent delete dead code on a legacy repo. It will be wrong about cron-scheduled callers, dynamic dispatch, and reflection. Flag, review with a veteran, delete by hand.
  • CLAUDE.md vs AGENTS.md — Claude Code reads CLAUDE.md, generic agents read AGENTS.md. Keep them as the same content with two filenames (symlink or generate from one source) so multi-tool teams don't fork.

Common pitfalls

  • Skipping the index step — every later tool degrades to grep-quality output without it. Don't.
  • Trusting the call graph 100% — reflection, dependency injection, and YAML-configured routes are invisible to static analysis. Verify hot paths with a runtime trace before refactoring.
  • Writing CLAUDE.md from agent output alone — talk to a veteran for 30 minutes before finalising. They will name three landmines no graph can see.
  • Modernising before you understand — Legacy Modernizer is named provocatively. Use it to summarise first, modernise much later. Week-one rewrites are how new hires get reverted.
  • Skipping Architect Reviewer on the first PR — first impressions matter. A 10-minute reviewer pass catches the convention violation that would otherwise eat your code-review thread for three days.
INSTALL · ONE COMMAND
$ tokrepo install pack/legacy-code-onboarding
hand it to your agent — or paste it in your terminal
What's inside

10 assets in this pack

MCP#01
Codebase Memory MCP — Code Intelligence for AI Agents

High-performance code intelligence MCP server. Indexes repos in milliseconds via tree-sitter AST, supports 66 languages, sub-ms graph queries. MIT, 1,300+ stars.

by MCP Hub·220 views
$ tokrepo install codebase-memory-mcp-code-intelligence-ai-agents-a3fe5165
Skill#02
Claude Code Agent: Codebase Explorer

|

by TokRepo精选·24 views
$ tokrepo install claude-code-agent-codebase-explorer-7b3ad600
Skill#03
Claude Code Agent: Codebase Pattern Finder

Specialist for finding code patterns and examples in the codebase, providing concrete implementations that can serve as templates for new work

by TokRepo精选·21 views
$ tokrepo install claude-code-agent-codebase-pattern-finder-bb32e488
Script#04
ast-index — Fast Code Search for Agents

ast-index builds an AST index for fast code search and integrates with Claude/Codex/Cursor; verified 347★ and installs via Homebrew.

by Script Depot·80 views
$ tokrepo install ast-index-fast-code-search-for-agents
MCP#05
CodeGraphContext — Graph Index for Code + MCP

CodeGraphContext indexes a repo into a code graph so developers and agents can query call chains, dependencies, and architecture via CLI or MCP mode.

by MCP Hub·56 views
$ tokrepo install codegraphcontext-graph-index-for-code-mcp
Script#06
Graphify — Repo Knowledge Graph + MCP

Graphify extracts docs/code into a knowledge graph and can install as an MCP/skill across Claude Code, Cursor, Codex, and Gemini CLI. Install via uv/pipx.

by Script Depot·107 views
$ tokrepo install graphify-repo-knowledge-graph-mcp
Skill#07
Claude Code Agent: Legacy Modernizer

Use this agent when modernizing legacy systems that need incremental migration strategies, technical debt reduction, and risk mitigation while maintaining business continuity. Specifically:\\n\\n<example>\\nContext: A development team has a 15-year-old mono...

by TokRepo精选·23 views
$ tokrepo install claude-code-agent-legacy-modernizer-dccb0175
Skill#08
Claude Code Agent: Technical Debt Manager

Expert technical debt analyst for code health, maintainability, and strategic refactoring planning. Use PROACTIVELY when codebase shows complexity growth, when planning...

by TokRepo精选·25 views
$ tokrepo install claude-code-agent-technical-debt-manager-6285fca6
Skill#09
Claude Code CLAUDE.md — Best Practices Template

Production-tested CLAUDE.md template for Claude Code projects. Covers coding conventions, test requirements, git workflow, and project-specific AI instructions.

by Skill Factory·245 views
$ tokrepo install claude-code-claude-md-best-practices-template-b152c845
Skill#10
Claude Code Agent: Architect Reviewer

Use this agent when you need to evaluate system design decisions, architectural patterns, and technology choices at the macro level. Specifically:\ \ \ Context: Team has...

by TokRepo精选·25 views
$ tokrepo install claude-code-agent-architect-reviewer-a8044a3f
FAQ

Frequently asked questions

Won't an AI agent just hallucinate the architecture?

It will if you let it answer from raw file context. That's exactly why the first three picks (Codebase Memory MCP, Codebase Explorer, Pattern Finder) exist: they build a structured index the agent queries instead of guessing. Hallucination drops sharply when the answer is grounded in a real graph rather than a 200k-token window of best-guess source files. Verify the first few answers against the code by hand; once they line up, trust grows reasonably.

How long does indexing a million-line repo actually take?

Plan 30 to 90 minutes the first time for Codebase Memory MCP and CodeGraphContext on a mid-size laptop. ast-index is much faster, usually under five minutes. Run the heavy indexers overnight on day one; subsequent queries are sub-second. Incremental reindex on file save is supported by all three and adds only milliseconds per change.

Can I really write a useful CLAUDE.md in week one?

Yes, if you treat the output as a draft and pair it with one veteran review. The per-module summaries, debt triage, and team-idiom list give you a structured first draft most teams have never had. Spend 30 minutes with a long-tenured engineer marking what's wrong, then ship version 0.1 of CLAUDE.md and AGENTS.md. Iterate weekly. The team will start contributing once they see something concrete to correct.

Why both CLAUDE.md and AGENTS.md — isn't one enough?

Claude Code reads CLAUDE.md by convention; tool-agnostic agents (Codex, Cursor, OpenCode, generic agent SDKs) increasingly look for AGENTS.md. Keeping both with the same content covers every agent the team might use in the next two years. Symlink them or generate from a single source-of-truth file to avoid drift.

What if my repo is private and I can't send code to a cloud LLM?

All ten picks support local-first or self-hosted deployment. Codebase Memory MCP, CodeGraphContext, Graphify, and ast-index run entirely on your machine. The Claude Code subagents and CLAUDE.md template work with any model the agent can talk to, including local Ollama or vLLM endpoints. You can complete the entire pack without a single byte of code leaving the laptop.

MORE FROM THE ARSENAL

12 packs · 80+ hand-picked assets

Browse every curated bundle on the home page

Back to all packs