Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 19, 2026·2 min de lecture

Claude Code Agent: Model Evaluator

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment re

Prêt pour agents

Staging sûr pour cet actif

Cet actif est d'abord staged. Le prompt copié demande à l'agent d'inspecter les fichiers staged avant d'activer scripts, config MCP ou config globale.

Stage only · 35/100Policy : staging
Surface agent
Tout agent MCP/CLI
Type
Agent
Installation
Single
Confiance
Confiance : Established
Point d'entrée
ai-specialists/model-evaluator
Commande de staging sûr
npx -y tokrepo@latest install 580e7db0-bfaa-4879-ac81-b8b5e58394aa --target codex

Stage les fichiers d'abord; l'activation exige la revue du README et du plan staged.

What This Agent Is For

AI model evaluation and benchmarking specialist. Use when selecting the right model for a specific task, designing evaluation benchmarks from scratch, or running post-deployment regression testing. Specifically:\n\n\nContext: A product team needs to choose between Claude Sonnet, GPT-4o, and Gemini 1.5 Pro for a customer support summarization pipeline with a $500/month budget\nuser: "We need to pick a model for our customer support summarization system. We process 50k tickets/month and need under 2s latency."\nass

Category: AI Specialists. Expected tool surface: Read, Write, Edit, Bash, Glob, Grep, WebSearch.

Agent Activation Brief

Use this asset when a task needs a focused specialist for ai specialists work. Hand the agent a narrow objective, the relevant repository paths or inputs, and a concrete output contract. Ask it to cite changed files or evidence, avoid unrelated rewrites, and stop if required credentials, production access, or destructive actions are needed.

Operating Boundaries

  • Treat this as a specialist agent, not a general chat prompt.
  • Keep write scope explicit before using it in a coding session.
  • Run normal project tests or verification after accepting its output.
  • Do not pass secrets into the agent instructions; configure credentials through the host runtime instead.

Clean Source

🙏

Source et remerciements

Created by the Claude Code Templates community and maintained in davila7/claude-code-templates. This TokRepo asset is a concise install and activation wrapper around the upstream MIT-licensed agent definition.

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires