# Cartesia Voice Cloning — Build a Voice Library from Audio

> Cartesia voice cloning creates a custom voice from a 5-30 second sample. Upload, save, version, share within your account. Consent built in.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

## Quick Use

1. Record 10-30s clean audio sample of the target voice
2. `client.voices.clone(clip=open('sample.wav','rb'), name='...', mode='stability')`
3. Use returned `voice['id']` in subsequent `client.tts.bytes(...)` calls

---

## Intro

Cartesia's voice cloning creates a high-fidelity custom voice from a 5-30 second audio sample — accent, timbre, pacing all preserved. Voices are saved to your account library, versionable, shareable across team members. The platform enforces consent attestation before clone-from-real-person — protecting against misuse. Best for: character voices in apps, branded customer support voices, audiobook narration with custom narrators. Works with: REST upload, Python/JS SDKs. Setup time: 5 minutes per voice.

---

### Upload + clone a voice

```python
from cartesia import Cartesia
client = Cartesia(api_key=os.environ["CARTESIA_API_KEY"])

with open("narrator-sample.wav", "rb") as f:
    voice = client.voices.clone(
        clip=f,
        name="Brand Narrator — Sarah",
        description="Warm mid-30s American female. Used for TokRepo product walkthrough videos.",
        mode="similarity",   # "similarity" (closer to source) | "stability" (more natural)
        enhance=True,        # auto-clean noise before training
    )

print(voice["id"])
```

### Use the cloned voice

```python
audio = client.tts.bytes(
    model_id="sonic-2",
    voice_id=voice["id"],
    transcript="Welcome to TokRepo. Let's walk through what's new this week.",
    output_format={"container": "mp3"},
)
```

### Voice library management

```python
# List all voices in your account
voices = client.voices.list()
for v in voices:
    print(v["id"], v["name"], v["is_owner"], v["is_starred"])

# Update metadata
client.voices.update(voice["id"], name="Brand Narrator — Sarah (v2)", description="...")

# Delete (cleanup unused)
client.voices.delete(voice["id"])
```

### Best practices for source audio

| Aspect | Recommendation |
|---|---|
| Length | 10-30 seconds (under 10 → similarity drops; over 30 → no further gain) |
| Content | Cover varied prosody — questions, statements, exclamations |
| Background | Silent room or denoised ahead of time |
| Format | WAV 16-bit 24kHz+ (mp3 is OK but lossy artifacts can leak in) |
| Avoid | Music, multiple speakers in clip, heavy reverb, extreme audio compression |

### Consent and policy

Cartesia requires attestation that the source voice is yours OR you have written permission from the voice owner. The platform monitors for misuse — cloning public figures without consent is grounds for account termination. For commercial brand voices, document the talent release agreement with your legal team.

---

### FAQ

**Q: similarity vs stability mode?**
A: Similarity sticks closer to the source — best for celebrity voice character work. Stability smooths variation — better for long-form narration where source artifacts would compound. Default to stability for production unless you specifically want source resemblance.

**Q: Can I clone in a language different from the source?**
A: Yes — clones cross languages. A 10s English source clip can synthesize Spanish/French output retaining the speaker's vocal characteristics. Accent transfer accuracy varies; test on representative content.

**Q: How big is my voice library quota?**
A: Free tier: 3 voices. Pro tier: 50. Scale tier: 500+. Cloned voices count toward the limit; pre-built voices do not. Delete unused voices to reclaim slots.

---

## Source & Thanks

> Built by [Cartesia](https://github.com/cartesia-ai). Voice cloning docs at [docs.cartesia.ai/voices/clone](https://docs.cartesia.ai).
>
> [cartesia-ai/cartesia-python](https://github.com/cartesia-ai/cartesia-python)

---

<!-- ZH -->

## 快速使用

1. 录 10-30 秒目标嗓音干净音频样本
2. `client.voices.clone(clip=open('sample.wav','rb'), name='...', mode='stability')`
3. 后续 `client.tts.bytes(...)` 调用用返回的 `voice['id']`

---

## 简介

Cartesia 嗓音克隆从 5-30 秒音频样本创建高保真自定义嗓音 —— 口音、音色、节奏都保留。嗓音存到账户库、可版本化、团队成员可共享。平台在克隆真人前强制同意声明 —— 防滥用。适合应用里的角色嗓音、品牌客服嗓音、自定义旁白的有声书。兼容 REST 上传、Python/JS SDK。每个嗓音装机时间 5 分钟。

---

### 上传 + 克隆嗓音

```python
from cartesia import Cartesia
client = Cartesia(api_key=os.environ["CARTESIA_API_KEY"])

with open("narrator-sample.wav", "rb") as f:
    voice = client.voices.clone(
        clip=f,
        name="品牌旁白 — Sarah",
        description="温暖 30 出头美国女性。用于 TokRepo 产品演示视频。",
        mode="similarity",   # "similarity"（更贴源）| "stability"（更自然）
        enhance=True,        # 训练前自动降噪
    )

print(voice["id"])
```

### 用克隆嗓音

```python
audio = client.tts.bytes(
    model_id="sonic-2",
    voice_id=voice["id"],
    transcript="Welcome to TokRepo. Let's walk through what's new this week.",
    output_format={"container": "mp3"},
)
```

### 嗓音库管理

```python
# 列账户所有嗓音
voices = client.voices.list()
for v in voices:
    print(v["id"], v["name"], v["is_owner"], v["is_starred"])

# 改元数据
client.voices.update(voice["id"], name="品牌旁白 — Sarah (v2)", description="...")

# 删（清理无用）
client.voices.delete(voice["id"])
```

### 源音频最佳实践

| 方面 | 建议 |
|---|---|
| 长度 | 10-30 秒（<10 相似度降；>30 不再涨）|
| 内容 | 涵盖韵律变化 —— 疑问、陈述、感叹 |
| 背景 | 安静房间或提前降噪 |
| 格式 | WAV 16-bit 24kHz+（mp3 也行但有损可能渗入）|
| 避免 | 音乐、片中多说话人、重混响、极致音频压缩 |

### 同意与政策

Cartesia 要求声明源嗓音是你的或你有嗓音所有者书面授权。平台监控滥用 —— 未经同意克隆公众人物可销户。商用品牌嗓音跟法务团队留好艺人授权文件。

---

### FAQ

**Q: similarity vs stability 模式？**
A: Similarity 贴源更近 —— 名人嗓音角色工作最佳。Stability 平滑变化 —— 长篇旁白更好（源 artifact 会累积）。生产默认 stability，除非特别想要源相似度。

**Q: 能在跟源不同的语言克隆吗？**
A: 能 —— 克隆跨语言。10 秒英文源片段可合成保留说话人嗓音特征的西/法语输出。口音迁移准确度因人而异，用代表内容测。

**Q: 嗓音库配额多大？**
A: 免费档 3 个嗓音。Pro 50 个。Scale 500+。克隆嗓音占名额；预置嗓音不占。删无用嗓音回收名额。

---

## 来源与感谢

> Built by [Cartesia](https://github.com/cartesia-ai). Voice cloning docs at [docs.cartesia.ai/voices/clone](https://docs.cartesia.ai).
>
> [cartesia-ai/cartesia-python](https://github.com/cartesia-ai/cartesia-python)


---
Source: https://tokrepo.com/en/workflows/cartesia-voice-cloning-build-a-voice-library-from-audio
Author: Cartesia