# Together AI Dedicated Endpoints Skill for Agents > Skill that teaches Claude Code Together AI's dedicated endpoints API. Deploy single-tenant GPU inference with autoscaling, no rate limits, and custom model configurations. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash npx skills add togethercomputer/skills ``` ## What is This Skill? This skill teaches AI coding agents how to provision and manage Together AI dedicated endpoints. Deploy models on single-tenant GPUs with autoscaling, no rate limits, and custom configurations for production workloads. **Answer-Ready**: Together AI Dedicated Endpoints Skill for coding agents. Single-tenant GPU inference with autoscaling and no rate limits. Custom model configs and production deployment. Part of official 12-skill collection. **Best for**: Teams deploying models for production inference at scale. **Works with**: Claude Code, Cursor, Codex CLI. ## What the Agent Learns ### Create Endpoint ```python from together import Together client = Together() endpoint = client.endpoints.create( model="meta-llama/Llama-3.3-70B-Instruct-Turbo", hardware="gpu-h100-80gb", min_replicas=1, max_replicas=4, autoscale=True, ) print(f"Endpoint URL: {endpoint.url}") ``` ### Hardware Options | GPU | VRAM | Best For | |-----|------|----------| | H100 80GB | 80GB | Large models, high throughput | | H200 | 141GB | Largest models | | A100 80GB | 80GB | Cost-effective | ### Endpoint Management ```python # Scale client.endpoints.update(endpoint.id, min_replicas=2) # Monitor status = client.endpoints.retrieve(endpoint.id) # Delete client.endpoints.delete(endpoint.id) ``` ## FAQ **Q: How is pricing different from serverless?** A: Dedicated endpoints charge per GPU-hour, not per token. Better economics at sustained high volume. ## Source & Thanks > Part of [togethercomputer/skills](https://github.com/togethercomputer/skills) — MIT licensed. ## 快速使用 ```bash npx skills add togethercomputer/skills ``` ## 什么是这个 Skill? 教 AI Agent 管理 Together AI 专属端点,单租户 GPU 推理,自动扩缩容,无速率限制。 **一句话总结**:Together AI 专属端点 Skill,单租户 GPU + 自动扩缩容 + 无限速,官方出品。 ## 来源与致谢 > [togethercomputer/skills](https://github.com/togethercomputer/skills) — MIT --- Source: https://tokrepo.com/en/workflows/cbbd6c05-ac9c-4dd6-9fde-cf5528919ea5 Author: AI Open Source