TOKREPO · Arsenal IA

Stable

Pack PhD : Littérature + Code de Recherche

Dix picks pour le doctorant qui fait une vraie revue de littérature et essaie de reproduire le code des articles : Zotero, arXiv MCP, GPT Researcher, agent academic-researcher, Marker, Nougat, JupyterLab, Papermill, Overleaf, AI Scientist. Recherche → gestionnaire → parsing PDF → lecture → reproduction → rédaction.

10 ressources

À propos de ce pack

What's in this pack

This is the rig for the PhD student or postdoc who is past the "chat with ChatGPT about my topic" phase and into the much harder work of (a) reading 200 papers properly, (b) tracking what cites what, (c) actually running the code the authors released, and (d) eventually writing something defensible. Every pick here is open-source, actively maintained, and earns its slot in the pipeline.

The sharp edge of this pack is that it refuses to pretend AI is a substitute for reading the methodology section. AI is in the loop for lit triage, PDF cleanup, first-pass summarization, code-repro debugging, and drafting — but a PhD who doesn't actually read the methods is a PhD whose thesis defense goes badly. The tools are arranged so the AI never sits between you and the paper itself, only around it.

Install in this order

Zotero — reference manager. Start here, on day one of the PhD. Browser connector grabs metadata + PDF in one click, organizes into collections, syncs across devices, generates BibTeX. If you don't have a single source of truth for citations from week one, you will pay for it in month 36.
arXiv MCP Server — programmatic paper search from inside Claude / Cursor / any MCP-aware client. Search arXiv, fetch metadata and full text, hand a paper to the model with one tool call. The replacement for "open browser, search, copy DOI, paste back".
GPT Researcher — autonomous lit-review agent. Given a query ("transformer scaling laws compute-optimal training"), it searches multiple sources, synthesizes findings, cites references, produces a draft survey. Use as the first-pass map of an unfamiliar subfield — never as the final citation list.
Claude Code Agent: Academic Researcher — a Claude Code subagent tuned for academic workflows: structured paper reading, methodology extraction, citation graph traversal. Lives in your Claude Code project so prompts and conventions are version-controlled with your thesis repo.
Marker — PDF → clean Markdown converter. The single biggest unlock for AI-assisted reading. Marker handles math, tables, figures, multi-column layouts. Convert a 40-page paper to Markdown once, then any LLM can ingest it cleanly without OCR noise eating the methodology.
Nougat — Meta's neural OCR specifically trained on academic documents. Where Marker is fast and general, Nougat is the heavyweight for equation-dense papers (theoretical ML, physics, math). LaTeX-aware output. Use it when Marker garbles a critical proof.
JupyterLab — the notebook IDE where you actually run the paper's released code, modify it, plot variants, sanity-check claims. Multi-document workspace, terminal, file browser. Where reproducibility either happens or doesn't.
Papermill — parameterize and execute notebooks from the command line. Critical when you need to sweep the paper's hyperparameter across 12 settings to verify the headline figure isn't a single-seed accident. Pairs with JupyterLab for production-grade experiment runs.
Overleaf (self-hosted) — collaborative LaTeX. The actual writing environment. Self-hosted variant keeps your unpublished thesis off a third-party server, which matters in fields with strict IP / embargo rules. BibTeX flows in directly from Zotero.
AI Scientist — Sakana AI's automated end-to-end paper generation system. Not for generating your actual thesis (don't), but a fascinating reference for what the frontier of AI-assisted scientific writing looks like, and a useful tool for generating ablation-experiment writeup drafts you then heavily edit.

How they fit together (research workflow)

  Lit search
  ┌────────────────────────────────────┐
  │ arXiv MCP ──► GPT Researcher       │
  │  (precise)    (broad map)          │
  └─────────────────┬──────────────────┘
                    ▼
         ┌───────────────────┐
         │  Zotero (truth)   │  ◄── BibTeX out to Overleaf
         │  collections +    │
         │  attached PDFs    │
         └─────────┬─────────┘
                   ▼
  PDF parse ┌──────────────────┐
            │ Marker (fast)    │
            │ Nougat (math)    │
            └────────┬─────────┘
                     ▼ clean markdown
        ┌─────────────────────────┐
        │ Academic Researcher     │
        │ Claude Code agent       │ ── summary, citation graph, gaps
        └──────────┬──────────────┘
                   ▼
         Reproduce code
         ┌───────────────────┐
         │ JupyterLab        │
         │   + Papermill     │ ── seed sweeps, ablations
         └────────┬──────────┘
                  ▼
            Writing
         ┌───────────────────┐
         │ Overleaf          │  ◄── citations from Zotero
         │ + AI Scientist    │      (draft only — you write)
         └───────────────────┘

The spine is Zotero as the single source of truth for what you've read. Everything upstream feeds Zotero; everything downstream reads from it. Without that discipline, the whole pipeline rots into a 4,000-tab browser and a thesis you can't reproduce.

Tradeoffs you'll hit

AI summarizing vs actually reading — The biggest risk in this pack. GPT Researcher and the Academic Researcher agent will happily summarize a paper in 30 seconds. That summary is good enough to decide whether to read the paper and dangerously misleading as a substitute for reading the methodology. Hard rule: if you cite a paper in your thesis, you read the methods section unaided. AI is for triage, not for cite-by-vibes.
Reproducibility ceiling — Papermill + JupyterLab let you run released code cleanly, but plenty of papers release code that no longer runs (dead dependencies, missing weights, wrong CUDA version). Budget time for environment archaeology. Pin everything in a conda env export. If a paper's claim collapses on rerun, that's a finding worth a footnote.
Marker vs Nougat — Marker is faster and handles tables well; Nougat is slower but actually parses LaTeX equations correctly. Run Marker first; reach for Nougat only when the math is the point.
Self-hosted Overleaf vs the SaaS — SaaS Overleaf is convenient but your draft is on someone else's machine. Self-hosted on your university cluster (or just a Docker container) is the right call for unpublished work. The cost is one afternoon of setup.
AI Scientist as a tool, not a goal — Generating papers end-to-end with AI is academically and ethically fraught. Treat it as a reference architecture for what's possible, and as a draft-generator for ablation tables — never as a way to bypass the actual scientific contribution.

Common pitfalls

Over-trusting an AI summary of a methodology — Summarizers compress; methodology details (loss formulation, regularization, data splits) are exactly what gets compressed away. Reviewers ask about exactly the details a summary drops. Read the methods.
Zotero PDFs scattered across devices — turn on WebDAV / your own sync target on day one. Discovering on year 3 that half your annotated PDFs only exist on a dead laptop is the canonical PhD horror story.
Notebook-only reproduction — a paper's figure_3.ipynb may run end-to-end but skip the actual training. Read what the notebook does before declaring "reproduced".
arXiv-only literature — arXiv is fast but biased toward ML / physics / math. For most of biology, social science, and humanities, the lit lives in journals reachable only via institutional access. Use the arXiv MCP for what arXiv covers, not as a universal source.
Conflating BibTeX entries — Zotero will happily import the same paper twice with slightly different metadata if you click the connector on both arXiv and the journal version. Run a duplicate check before every chapter handoff.

INSTALLER · UNE COMMANDE

$ tokrepo install pack/phd-researcher-lit-code

passez-la à votre agent — ou collez-la dans votre terminal

Ce qu'il contient

10 ressources prêtes à installer

Skill#01

Zotero — Free Research Source Manager and Citation Tool

Zotero is a free, open-source reference management tool that helps you collect, organize, annotate, cite, and share research sources. Available on Windows, macOS, Linux, and iOS, it supports one-click saving from browsers and generates citations in thousands of styles.

by AI Open Source·218 views

$ tokrepo install zotero-free-research-source-manager-citation-tool-74dca4cd

MCP#02

arXiv MCP Server — Search and Analyze Papers

arxiv-mcp-server is an MCP server for searching and analyzing arXiv papers, with uvx/uv tool stdio launch examples for reproducible research workflows.

by MCP Hub·183 views

$ tokrepo install arxiv-mcp-server-search-and-analyze-papers

Skill#03

GPT Researcher — Autonomous Research Report Agent

AI agent that generates detailed research reports from a single query. Searches multiple sources, synthesizes findings, and cites references.

by TokRepo精选·4228 views

$ tokrepo install gpt-researcher-autonomous-research-report-agent-23330210

Skill#04

Claude Code Agent: Academic Researcher

Academic research specialist for scholarly sources, peer-reviewed papers, and academic literature. Use PROACTIVELY for research paper analysis, literature reviews, citation...

by TokRepo精选·165 views

$ tokrepo install claude-code-agent-academic-researcher-ed4529f4

Skill#05

Marker — Convert PDF to Markdown with High Accuracy

Fast, accurate PDF to Markdown + JSON converter. Handles tables, images, equations, code blocks, and multi-column layouts. GPU-accelerated. 33K+ GitHub stars.

by Script Depot·280 views

$ tokrepo install marker-convert-pdf-markdown-high-accuracy-42976daf

Skill#06

Nougat — Neural Optical Understanding for Academic Documents

Nougat is a visual transformer model from Meta that converts academic PDF pages into structured Markdown, accurately preserving mathematical equations, tables, and text formatting.

by AI Open Source·169 views

$ tokrepo install nougat-neural-optical-understanding-academic-documents-ed1264b8

Skill#07

JupyterLab — Next-Generation Interactive Development Environment

The extensible web-based IDE for notebooks, code, and data from Project Jupyter, succeeding the classic Jupyter Notebook interface.

by AI Open Source·211 views

$ tokrepo install jupyterlab-next-generation-interactive-development-4de315f7

Skill#08

Papermill — Parameterize and Execute Jupyter Notebooks

Papermill is a Python tool for parameterizing, executing, and analyzing Jupyter notebooks programmatically, enabling notebook-based pipelines and report generation.

by AI Open Source·274 views

$ tokrepo install papermill-parameterize-execute-jupyter-notebooks-4be4a73a

Skill#09

Overleaf — Self-Hosted Collaborative LaTeX Editor

Overleaf is an open-source web-based LaTeX editor that enables real-time collaborative document editing. Self-host it with Docker to keep your academic papers and technical documents on your own infrastructure.

by Script Depot·178 views

$ tokrepo install overleaf-self-hosted-collaborative-latex-editor-8d4b8be6

Prompt#10

AI Scientist — Automated Research Paper Generation

Fully automated AI system that conducts research, runs experiments, and writes complete scientific papers. Generates novel ideas, implements them, and produces LaTeX manuscripts. 12,000+ stars.

by Prompt Lab·300 views

$ tokrepo install ai-scientist-automated-research-paper-generation-0a2623ca

Questions fréquentes

I'm at the start of my PhD — do I really need all ten of these on day one?

No — install Zotero, JupyterLab, and Overleaf in week one, because those three become muscle memory and migration cost compounds. Add arXiv MCP and the academic-researcher agent in month two once you've found your subfield. Marker, Nougat, Papermill, and AI Scientist arrive when you hit the specific problem each solves — don't preinstall solutions to problems you don't have yet.

Can an AI agent actually do my literature review for me?

Not in any way that survives a thesis defense. GPT Researcher and the academic-researcher agent are excellent at producing a first-pass map of an unfamiliar field — that map is roughly the quality of a third-year undergrad's literature review. Use it to find the seminal papers and identify the major camps, then read those papers yourself. Submitting an AI-generated review as your literature chapter is plagiarism in most universities and intellectual self-sabotage in all of them.

Marker or Nougat — which PDF-to-text tool should I install first?

Install Marker first. It's faster, handles tables and figures well, and covers 90% of papers acceptably. Add Nougat when you start working with equation-heavy theoretical papers — Nougat was trained specifically on academic documents and preserves LaTeX math far better. Running both and picking per-paper is also fine; storage and compute are cheap, missed equations are not.

How do I keep my PhD reproducible if I'm running 50 different notebooks across different papers?

Three rules. (1) Every reproduction lives in its own directory with its own environment.yml or requirements.txt pinned to exact versions. (2) Use Papermill to invoke notebooks via parameters rather than editing in-place — the source notebook stays clean, the run record stays auditable. (3) Save the executed notebook + outputs alongside the input parameters, so two years later you can prove what you ran. Conda environments, git, and a RUNS/ directory of executed Papermill outputs solve 95% of reproducibility pain.

Is it ethical to use AI Scientist or Claude to help write my thesis?

Depends entirely on your university's policy and your honest disclosure. Common consensus as of 2026: AI is fine for outlining, grammar, idea-stress-testing, and generating draft prose you then heavily rewrite — the same way a writing tutor would help. AI is not fine for generating original analysis, fabricating citations, or producing prose you submit unedited. When in doubt, disclose in the methods section. The point of a PhD is that you can defend every sentence; if you can't defend a paragraph an AI wrote, don't include it.

PLUS DANS L'ARSENAL

12 packs · 80+ ressources sélectionnées

Découvrez tous les packs curatés sur la page d'accueil

Retour à tous les packs