SkillsMay 19, 2026·2 min read

Claude Code Agent: Model Evaluator

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment re

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 35/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Agent
Install
Single
Trust
Trust: Established
Entrypoint
ai-specialists/model-evaluator
Universal CLI install command
npx tokrepo install 580e7db0-bfaa-4879-ac81-b8b5e58394aa

What This Agent Is For

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment regression testing. Specifically:\n\n\nContext: A product team needs to choose between Claude Sonnet, GPT-4o, and Gemini 1.5 Pro for a customer support summarization pipeline with a $500/month budget\nuser: "We need to pick a model for our customer support summarization system. We process 50k tickets/month and need under 2s latency."\nass

Category: AI Specialists. Expected tool surface: Read, Write, Edit, Bash, Glob, Grep, WebSearch.

Agent Activation Brief

Use this asset when a task needs a focused specialist for ai specialists work. Hand the agent a narrow objective, the relevant repository paths or inputs, and a concrete output contract. Ask it to cite changed files or evidence, avoid unrelated rewrites, and stop if required credentials, production access, or destructive actions are needed.

Operating Boundaries

  • Treat this as a specialist agent, not a general chat prompt.
  • Keep write scope explicit before using it in a coding session.
  • Run normal project tests or verification after accepting its output.
  • Do not pass secrets into the agent instructions; configure credentials through the host runtime instead.

Clean Source

🙏

Source & Thanks

Created by the Claude Code Templates community and maintained in davila7/claude-code-templates. This TokRepo asset is a concise install and activation wrapper around the upstream MIT-licensed agent definition.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets