SkillsApr 8, 2026·1 min read

Together AI Dedicated Containers Skill for Agents

Skill that teaches Claude Code Together AI's container deployment API. Run custom Docker inference workers on managed GPU infrastructure with full environment control.

TL;DR
A Claude Code skill for deploying custom Docker inference workers on Together AI GPU infrastructure.
§01

What it is

This skill teaches Claude Code how to use Together AI's dedicated container deployment API. It enables AI agents to deploy custom Docker images as inference workers on managed GPU infrastructure with full environment control, scaling configuration, and health monitoring.

The skill targets developers who use Claude Code to manage AI infrastructure and want their agent to handle container deployments on Together AI's GPU cloud.

§02

How it saves time or tokens

Without this skill, deploying containers on Together AI requires reading API docs, constructing JSON payloads, and managing authentication manually. The skill gives Claude Code the exact API patterns, so you describe what you want in natural language and the agent handles the REST calls, environment configuration, and deployment verification.

§03

How to use

  1. Add this skill to your Claude Code project configuration.
  2. Set your Together AI API key as an environment variable.
  3. Ask Claude Code to deploy, scale, or manage your inference containers.
§04

Example

import requests

TOGETHER_API_KEY = 'your-api-key'

# Deploy a custom inference container
response = requests.post(
    'https://api.together.xyz/v1/dedicated/containers',
    headers={'Authorization': f'Bearer {TOGETHER_API_KEY}'},
    json={
        'image': 'my-registry/my-model:latest',
        'gpu_type': 'NVIDIA_A100_80GB',
        'num_gpus': 1,
        'env': {
            'MODEL_NAME': 'my-custom-model',
            'MAX_BATCH_SIZE': '32'
        }
    }
)
print(response.json())
§05

Related on TokRepo

§06

Common pitfalls

  • GPU availability varies by type and region. A100 80GB instances may have queues during peak demand. Check availability before committing to a deployment timeline.
  • Container images must be accessible from Together AI's infrastructure. Use a public registry or configure registry credentials in the API call.
  • Dedicated containers have a minimum billing period. Shut down unused containers promptly to avoid unnecessary costs.

Frequently Asked Questions

What GPU types does Together AI offer for dedicated containers?+

Together AI offers NVIDIA A100 (40GB and 80GB), H100, and other GPU types depending on availability. Check the Together AI documentation for the current list and pricing.

Can I use custom Docker images?+

Yes. You provide your own Docker image with your model and serving code. Together AI runs it on their GPU infrastructure with the environment variables and ports you specify.

How does scaling work?+

You specify the number of GPUs and replicas in the deployment configuration. Together AI manages the infrastructure scaling. You can update replica counts through the API.

Is this skill Claude Code specific?+

The skill is designed for Claude Code but the underlying API knowledge applies to any AI agent or manual workflow. The skill format follows Claude Code's CLAUDE.md convention.

How do I monitor deployed containers?+

Together AI provides health check endpoints and status APIs. The skill teaches Claude Code how to query container status, check logs, and verify that the deployment is healthy.

Citations (3)
🙏

Source & Thanks

Part of togethercomputer/skills — MIT licensed.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets