SkillsMay 19, 2026·2 min read

Claude Code Agent: Model Evaluator

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment re

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 35/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Agent
Install
Single
Trust
Trust: Established
Entrypoint
ai-specialists/model-evaluator
Safe staging command
npx -y tokrepo@latest install 580e7db0-bfaa-4879-ac81-b8b5e58394aa --target codex

Stages files first; activation requires review of the staged README and plan.

What This Agent Is For

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment regression testing. Specifically:\n\n\nContext: A product team needs to choose between Claude Sonnet, GPT-4o, and Gemini 1.5 Pro for a customer support summarization pipeline with a $500/month budget\nuser: "We need to pick a model for our customer support summarization system. We process 50k tickets/month and need under 2s latency."\nass

Category: AI Specialists. Expected tool surface: Read, Write, Edit, Bash, Glob, Grep, WebSearch.

Agent Activation Brief

Use this asset when a task needs a focused specialist for ai specialists work. Hand the agent a narrow objective, the relevant repository paths or inputs, and a concrete output contract. Ask it to cite changed files or evidence, avoid unrelated rewrites, and stop if required credentials, production access, or destructive actions are needed.

Operating Boundaries

  • Treat this as a specialist agent, not a general chat prompt.
  • Keep write scope explicit before using it in a coding session.
  • Run normal project tests or verification after accepting its output.
  • Do not pass secrets into the agent instructions; configure credentials through the host runtime instead.

Clean Source

🙏

Source & Thanks

Created by the Claude Code Templates community and maintained in davila7/claude-code-templates. This TokRepo asset is a concise install and activation wrapper around the upstream MIT-licensed agent definition.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets