Esta página se muestra en inglés. Una traducción al español está en curso.
SkillsMay 19, 2026·2 min de lectura

Claude Code Agent: Model Evaluator

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment re

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 35/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Agent
Instalación
Single
Confianza
Confianza: Established
Entrada
ai-specialists/model-evaluator
Comando de staging seguro
npx -y tokrepo@latest install 580e7db0-bfaa-4879-ac81-b8b5e58394aa --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

What This Agent Is For

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment regression testing. Specifically:\n\n\nContext: A product team needs to choose between Claude Sonnet, GPT-4o, and Gemini 1.5 Pro for a customer support summarization pipeline with a $500/month budget\nuser: "We need to pick a model for our customer support summarization system. We process 50k tickets/month and need under 2s latency."\nass

Category: AI Specialists. Expected tool surface: Read, Write, Edit, Bash, Glob, Grep, WebSearch.

Agent Activation Brief

Use this asset when a task needs a focused specialist for ai specialists work. Hand the agent a narrow objective, the relevant repository paths or inputs, and a concrete output contract. Ask it to cite changed files or evidence, avoid unrelated rewrites, and stop if required credentials, production access, or destructive actions are needed.

Operating Boundaries

  • Treat this as a specialist agent, not a general chat prompt.
  • Keep write scope explicit before using it in a coding session.
  • Run normal project tests or verification after accepting its output.
  • Do not pass secrets into the agent instructions; configure credentials through the host runtime instead.

Clean Source

🙏

Fuente y agradecimientos

Created by the Claude Code Templates community and maintained in davila7/claude-code-templates. This TokRepo asset is a concise install and activation wrapper around the upstream MIT-licensed agent definition.

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados