Cartesia voice cloning creates a custom voice from a 5-30 second sample. Upload, save, version, share within your account. Consent built in.
Cartesia's streaming WebSocket pipelines LLM text chunks in and audio out simultaneously. Required for sub-second voice agent round-trips.
Cartesia Sonic is a state-space-model TTS with 75ms time-to-first-audio. 100+ voices, 5s cloning, streaming WebSocket. Lowest-latency TTS.