AI HR + Recruiting Stack
Ten picks for the recruiter or HR lead putting AI into the funnel: source candidates, parse and screen resumes, prep interview questions, capture and summarise calls, draft offer letters, and onboard — with a bias-audit pass before any decision touches a human. ATS connectors via MCP, not chatbots.
What's in this pack
This is the stack a recruiter or HR lead would actually wire up to handle a hiring round end-to-end — not a 50-vendor demo day. Every pick here does one job in the funnel a real opening goes through: find the candidates, parse the resumes, screen them against the JD, prep the interview, capture the call, draft the offer, onboard the new hire. And one tool that sits across every step: the bias-audit pass that runs before a human gets a recommendation.
The stack is agent-driven on purpose. The recruiter spends the morning wiring it up; from then on the agents do the grunt work — boolean searches, resume reformatting, screening summaries, transcript notes — and the recruiter only steps in where judgment is required (the call itself, the negotiation, the close). Critically: no auto-reject. Every screening step produces a ranked list with reasons, never a hidden filter that drops a candidate before a human sees them.
Install in this order
- Tavily Search — the search engine your sourcing agent calls. Pull "senior React engineers who blogged about Server Components in 2026," "compensation benchmarks for staff PM in Berlin," or "why Acme just laid off their growth team." Free tier covers 1,000 queries/month — enough for a small team.
- Apify MCP Server — 8,000+ pre-built scrapers (LinkedIn-style profiles, job boards, GitHub, Stack Overflow Careers) exposed as MCP tools. Use this once Tavily's text snippets aren't enough and you need structured candidate rows you can dedupe and rank.
- Jina Reader —
https://r.jina.ai/<url>returns clean markdown of any page. The unglamorous workhorse: paste a candidate's personal site, a competitor's careers page, or a 40-page benefits PDF, get back text the LLM can actually reason over. - Reactive Resume — open-source resume builder that doubles as a parser. Candidates submit raw resumes; you export them through Reactive into a consistent JSON Resume schema before any LLM screen. Same fields every time = comparable screening signals.
- Docling — IBM-grade document parser for PDFs, DOCX, scanned resumes. Handles the messy real-world cases Reactive can't (1990s scanned CVs, two-column EU formats, image-only PDFs). Output is structured markdown your screening agent can chew on.
- Phoenix Evals — LLM-as-judge library with built-in templates. This is where the actual screening happens: define your scorecard (years of relevant experience, domain match, communication clarity), Phoenix runs the same prompt against every candidate, returns numeric rubric scores with rationale. Auditable, reproducible.
- Anarlog — open-source local AI meeting notes. Records and transcribes screening calls on the recruiter's machine — candidate audio never leaves the laptop. Output is a summary + action items you can drop into the ATS without uploading to a third-party SaaS.
- Faster Whisper — 4x faster than OpenAI Whisper, runs locally. The transcription engine Anarlog and your batch interview pipeline use under the hood. Switch to this when you have 20 phone screens a week and need turnaround in minutes, not hours.
- Prompt Perfect — system prompt engineering templates. Use it to keep your offer-letter prompt, your rejection prompt, your reference-check prompt under version control. "Generic friendly tone, no comp numbers, mention next step" should produce the same letter on Monday as on Friday.
- Claude Code Agent: AI Ethics Advisor — the gate. Before any shortlist goes to a hiring manager, the Ethics Advisor reviews the screening rubric and the resulting ranking for protected-class proxies (zip code, school name, graduation year, photo). Flags get sent back to the recruiter, never auto-applied. This is the only step that's allowed to block a downstream action.
How they fit together
┌─ Tavily ─── Apify MCP ───┐
│ (search) (scrape) │ SOURCE
└─────────┬─────────────────┘
▼
Jina Reader (URL → text)
│
▼
Reactive Resume ── Docling SCREEN
(JSON schema) (messy PDFs)
│
▼
Phoenix Evals
(LLM rubric, scored)
│
▼ ────────────────────────┐
AI Ethics Advisor │
(bias audit, gate) │
│ │
▼ │
Anarlog + Faster Whisper INTERVIEW
(record + transcribe call) │
│ │
▼ │
Prompt Perfect OFFER
(offer letter, rejection, + ONBOARD
reference check templates)
The non-obvious join is Phoenix Evals → Ethics Advisor: Phoenix gives you a defensible, repeatable scorecard; the Ethics Advisor inspects that scorecard for proxy variables before the ranking is shown to anyone. Without the gate, an LLM-as-judge pipeline can silently re-encode every bias in the training data. With it, you have a paper trail.
Tradeoffs you'll hit
- Reactive Resume vs Docling — Reactive is opt-in (candidate uses the builder); Docling is mandatory (you parse whatever comes in). Run both: Reactive for clean schema when the candidate cooperates, Docling for the 40% of inbound that arrives as a scanned PDF from 2014.
- Anarlog (local) vs cloud meeting bots — Anarlog keeps the audio on the recruiter's laptop. Cloud bots (Fireflies, Otter) are faster to set up but log candidate audio in a US-based vendor that may or may not be GDPR-cleared for your region. For EU candidates specifically, default to local.
- Phoenix Evals vs hand-graded screens — Phoenix is reproducible and fast; a recruiter reading every resume is irreplaceable for the top-of-funnel signal a rubric can't capture. The right mix is Phoenix for the first pass (cut 200 → 30), human for the second (30 → 8).
- Auto-applying Ethics Advisor flags — don't. The Ethics Advisor is a reviewer, not an enforcer. Auto-rejecting candidates because the model flagged a proxy is exactly the failure mode you're trying to avoid. Flags go to the recruiter; the recruiter decides.
Common pitfalls
- Letting the screening rubric live in someone's head — Phoenix wants a written rubric. If you can't articulate "3 points for direct domain experience, 2 for adjacent, 1 for transferable, 0 for unrelated," the screen isn't reproducible and the bias audit can't catch anything. Write the rubric before you wire up the agent.
- Sending candidate PII through a third-party LLM — the SaaS Claude/OpenAI endpoints log prompts. For resume content that includes name, email, address, school, default to a local model (Ollama + a 12B Llama variant) for the screening step, and only send the rubric score upstream. Reserve cloud calls for the offer letter, not the screen.
- ATS "AI integration" claims — most ATS vendors are reselling GPT calls with a UI. The point of this pack is that you own the prompt, the rubric, and the audit trail — not that you outsource them to a vendor's locked surface. Use the ATS's MCP / webhook layer; skip the bundled "AI screening."
- No human-in-the-loop on rejections — even with a perfect rubric, automate-then-send rejection emails is the single fastest way to a discrimination complaint. Every rejection touches a human reviewer before it leaves the building.
- Forgetting to delete candidate data on schedule — most jurisdictions cap how long you can hold an applicant's data. Wire a cron into your pipeline that purges resumes + transcripts at the retention cliff. Don't rely on "we'll do it manually."
10 assets in this pack
Frequently asked questions
Is this stack legal to use for hiring decisions in the EU, NYC, or California?
The stack itself is neutral — what makes it compliant or not is how you use it. EU AI Act treats hiring algorithms as high-risk: you owe documentation, human oversight, and the ability to explain individual decisions. NYC Local Law 144 requires an annual bias audit and candidate notification when an automated employment decision tool is used. California's draft regulations are heading in the same direction. The Ethics Advisor + Phoenix Evals combination produces the audit trail those laws want, but only if you run them and keep the logs. Talk to your employment counsel before going live.
How does this connect to my actual ATS (Greenhouse, Lever, Workable)?
Through MCP or webhooks, not through a chatbot. Most modern ATS systems expose a REST API with candidate / application / interview endpoints; you wrap that as an MCP server (or use a community one) and your agents call it the same way they call Tavily or Apify. Avoid the temptation to install the ATS vendor's bundled "AI assistant" — you lose control of the prompt and the data trail. Keep the ATS as the system of record and the agents as a layer that reads from it, ranks, and writes back structured notes.
What does the bias audit actually check?
The AI Ethics Advisor inspects the screening rubric for variables that correlate with protected classes without measuring the actual job requirement — common proxies are zip code (race), school prestige (class), graduation year (age), employment gaps (caregiving), photo presence (everything). It also runs the resulting ranking against the input pool to flag disparate impact: if 40% of applicants are women and 10% of the top-20 are women, that's a flag worth a human eyeballing the rubric. It does not, and should not, make the decision itself.
What's the smallest possible version of this pack I can run this week?
Three picks: Docling (parse whatever PDF lands in your inbox), Phoenix Evals (one written rubric, one LLM call per resume), and AI Ethics Advisor (review the rubric and the output ranking before anyone sees it). That's a defensible AI-assisted screen in roughly a day of setup. Add Anarlog + Faster Whisper the week you have more than 10 phone screens. Add Tavily + Apify only when sourcing volume justifies it — most small recruiting teams don't need a sourcing agent.
How much does this whole stack cost to run for a recruiting team?
Realistic baseline: $30-100/month for a small in-house team. Tavily free tier covers 1K queries; Apify pay-as-you-go runs $5-30/mo for typical scraping volume; Jina Reader has a generous free tier; Anarlog, Faster Whisper, Reactive Resume, and Docling are self-hosted and free; Phoenix Evals is open source but the LLM calls it issues are billed to your Claude/OpenAI account (budget $20-50/mo at a few hundred resumes/week). The Ethics Advisor is a Claude Code subagent — included if you already use Claude Code. The hidden cost is the time to write and version your rubric and prompt templates; budget half a day per role family.
12 packs · 80+ hand-picked assets
Browse every curated bundle on the home page
Back to all packs