# Cartesia Voice Cloning — Build a Voice Library from Audio > Cartesia voice cloning creates a custom voice from a 5-30 second sample. Upload, save, version, share within your account. Consent built in. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use 1. Record 10-30s clean audio sample of the target voice 2. `client.voices.clone(clip=open('sample.wav','rb'), name='...', mode='stability')` 3. Use returned `voice['id']` in subsequent `client.tts.bytes(...)` calls --- ## Intro Cartesia's voice cloning creates a high-fidelity custom voice from a 5-30 second audio sample — accent, timbre, pacing all preserved. Voices are saved to your account library, versionable, shareable across team members. The platform enforces consent attestation before clone-from-real-person — protecting against misuse. Best for: character voices in apps, branded customer support voices, audiobook narration with custom narrators. Works with: REST upload, Python/JS SDKs. Setup time: 5 minutes per voice. --- ### Upload + clone a voice ```python from cartesia import Cartesia client = Cartesia(api_key=os.environ["CARTESIA_API_KEY"]) with open("narrator-sample.wav", "rb") as f: voice = client.voices.clone( clip=f, name="Brand Narrator — Sarah", description="Warm mid-30s American female. Used for TokRepo product walkthrough videos.", mode="similarity", # "similarity" (closer to source) | "stability" (more natural) enhance=True, # auto-clean noise before training ) print(voice["id"]) ``` ### Use the cloned voice ```python audio = client.tts.bytes( model_id="sonic-2", voice_id=voice["id"], transcript="Welcome to TokRepo. Let's walk through what's new this week.", output_format={"container": "mp3"}, ) ``` ### Voice library management ```python # List all voices in your account voices = client.voices.list() for v in voices: print(v["id"], v["name"], v["is_owner"], v["is_starred"]) # Update metadata client.voices.update(voice["id"], name="Brand Narrator — Sarah (v2)", description="...") # Delete (cleanup unused) client.voices.delete(voice["id"]) ``` ### Best practices for source audio | Aspect | Recommendation | |---|---| | Length | 10-30 seconds (under 10 → similarity drops; over 30 → no further gain) | | Content | Cover varied prosody — questions, statements, exclamations | | Background | Silent room or denoised ahead of time | | Format | WAV 16-bit 24kHz+ (mp3 is OK but lossy artifacts can leak in) | | Avoid | Music, multiple speakers in clip, heavy reverb, extreme audio compression | ### Consent and policy Cartesia requires attestation that the source voice is yours OR you have written permission from the voice owner. The platform monitors for misuse — cloning public figures without consent is grounds for account termination. For commercial brand voices, document the talent release agreement with your legal team. --- ### FAQ **Q: similarity vs stability mode?** A: Similarity sticks closer to the source — best for celebrity voice character work. Stability smooths variation — better for long-form narration where source artifacts would compound. Default to stability for production unless you specifically want source resemblance. **Q: Can I clone in a language different from the source?** A: Yes — clones cross languages. A 10s English source clip can synthesize Spanish/French output retaining the speaker's vocal characteristics. Accent transfer accuracy varies; test on representative content. **Q: How big is my voice library quota?** A: Free tier: 3 voices. Pro tier: 50. Scale tier: 500+. Cloned voices count toward the limit; pre-built voices do not. Delete unused voices to reclaim slots. --- ## Source & Thanks > Built by [Cartesia](https://github.com/cartesia-ai). Voice cloning docs at [docs.cartesia.ai/voices/clone](https://docs.cartesia.ai). > > [cartesia-ai/cartesia-python](https://github.com/cartesia-ai/cartesia-python) --- ## 快速使用 1. 录 10-30 秒目标嗓音干净音频样本 2. `client.voices.clone(clip=open('sample.wav','rb'), name='...', mode='stability')` 3. 后续 `client.tts.bytes(...)` 调用用返回的 `voice['id']` --- ## 简介 Cartesia 嗓音克隆从 5-30 秒音频样本创建高保真自定义嗓音 —— 口音、音色、节奏都保留。嗓音存到账户库、可版本化、团队成员可共享。平台在克隆真人前强制同意声明 —— 防滥用。适合应用里的角色嗓音、品牌客服嗓音、自定义旁白的有声书。兼容 REST 上传、Python/JS SDK。每个嗓音装机时间 5 分钟。 --- ### 上传 + 克隆嗓音 ```python from cartesia import Cartesia client = Cartesia(api_key=os.environ["CARTESIA_API_KEY"]) with open("narrator-sample.wav", "rb") as f: voice = client.voices.clone( clip=f, name="品牌旁白 — Sarah", description="温暖 30 出头美国女性。用于 TokRepo 产品演示视频。", mode="similarity", # "similarity"(更贴源)| "stability"(更自然) enhance=True, # 训练前自动降噪 ) print(voice["id"]) ``` ### 用克隆嗓音 ```python audio = client.tts.bytes( model_id="sonic-2", voice_id=voice["id"], transcript="Welcome to TokRepo. Let's walk through what's new this week.", output_format={"container": "mp3"}, ) ``` ### 嗓音库管理 ```python # 列账户所有嗓音 voices = client.voices.list() for v in voices: print(v["id"], v["name"], v["is_owner"], v["is_starred"]) # 改元数据 client.voices.update(voice["id"], name="品牌旁白 — Sarah (v2)", description="...") # 删(清理无用) client.voices.delete(voice["id"]) ``` ### 源音频最佳实践 | 方面 | 建议 | |---|---| | 长度 | 10-30 秒(<10 相似度降;>30 不再涨)| | 内容 | 涵盖韵律变化 —— 疑问、陈述、感叹 | | 背景 | 安静房间或提前降噪 | | 格式 | WAV 16-bit 24kHz+(mp3 也行但有损可能渗入)| | 避免 | 音乐、片中多说话人、重混响、极致音频压缩 | ### 同意与政策 Cartesia 要求声明源嗓音是你的或你有嗓音所有者书面授权。平台监控滥用 —— 未经同意克隆公众人物可销户。商用品牌嗓音跟法务团队留好艺人授权文件。 --- ### FAQ **Q: similarity vs stability 模式?** A: Similarity 贴源更近 —— 名人嗓音角色工作最佳。Stability 平滑变化 —— 长篇旁白更好(源 artifact 会累积)。生产默认 stability,除非特别想要源相似度。 **Q: 能在跟源不同的语言克隆吗?** A: 能 —— 克隆跨语言。10 秒英文源片段可合成保留说话人嗓音特征的西/法语输出。口音迁移准确度因人而异,用代表内容测。 **Q: 嗓音库配额多大?** A: 免费档 3 个嗓音。Pro 50 个。Scale 500+。克隆嗓音占名额;预置嗓音不占。删无用嗓音回收名额。 --- ## 来源与感谢 > Built by [Cartesia](https://github.com/cartesia-ai). Voice cloning docs at [docs.cartesia.ai/voices/clone](https://docs.cartesia.ai). > > [cartesia-ai/cartesia-python](https://github.com/cartesia-ai/cartesia-python) --- Source: https://tokrepo.com/en/workflows/cartesia-voice-cloning-build-a-voice-library-from-audio Author: Cartesia