# Together AI Batch Inference Skill for Claude Code

> Skill that teaches Claude Code Together AI's batch inference API. Run high-volume async inference jobs at up to 50% lower cost with automatic queuing and result retrieval.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

```bash
npx skills add togethercomputer/skills
```

## What is This Skill?

This skill teaches AI coding agents how to use Together AI's batch inference API for high-volume, asynchronous workloads. Submit thousands of prompts in a single job, pay up to 50% less than real-time inference, and retrieve results when ready.

**Answer-Ready**: Together AI Batch Inference Skill for coding agents. High-volume async inference at up to 50% cost savings. Automatic queuing, progress tracking, and result retrieval. Part of official 12-skill collection.

**Best for**: Teams running large-scale LLM inference jobs. **Works with**: Claude Code, Cursor, Codex CLI.

## What the Agent Learns

### Submit Batch Job

```python
from together import Together

client = Together()

# Upload input file (JSONL format)
file = client.files.upload("batch_input.jsonl")

# Create batch job
batch = client.batch.create(
    input_file_id=file.id,
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    endpoint="/v1/chat/completions",
)
print(f"Batch ID: {batch.id}")
```

### Input Format (JSONL)

```json
{"custom_id": "req-1", "body": {"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Summarize this article..."}]}}
{"custom_id": "req-2", "body": {"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Translate to French..."}]}}
```

### Check Status & Retrieve Results

```python
status = client.batch.retrieve(batch.id)
print(f"Progress: {status.completed}/{status.total}")

# Download results when done
if status.status == "completed":
    results = client.files.content(status.output_file_id)
```

## FAQ

**Q: How much cheaper is batch inference?**
A: Up to 50% cheaper than real-time. Exact savings depend on model and volume.

**Q: How long does it take?**
A: Results typically available within 24 hours. Priority processing available at standard pricing.

## Source & Thanks

> Part of [togethercomputer/skills](https://github.com/togethercomputer/skills) — MIT licensed.

<!-- ZH -->

## 快速使用

```bash
npx skills add togethercomputer/skills
```

## 什么是这个 Skill？

教 AI Agent 使用 Together AI 的批量推理 API，大规模异步推理最高省 50% 成本。

**一句话总结**：Together AI 批量推理 Skill，异步高吞吐推理最高省 50%，自动排队和结果获取，官方出品。

## 来源与致谢

> [togethercomputer/skills](https://github.com/togethercomputer/skills) — MIT

---
Source: https://tokrepo.com/en/workflows/90286a47-45df-40cf-a8f0-e013e02ecbaf
Author: Script Depot