# RVC — Retrieval-Based Voice Conversion Training & Inference

> Train custom voice conversion models with as little as 10 minutes of audio data using retrieval-based techniques for natural-sounding results.

## Install

Save as a script file and run:

# RVC — Retrieval-Based Voice Conversion

## Quick Use
```bash
git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git
cd Retrieval-based-Voice-Conversion-WebUI
pip install -r requirements.txt
python infer-web.py
```

## Introduction
RVC is an open-source voice conversion framework that uses retrieval-based techniques to produce high-quality voice cloning with minimal training data. It enables users to train custom voice models from as little as 10 minutes of audio and perform real-time inference through a Gradio web interface.

## What RVC Does
- Trains voice conversion models from short audio clips using FAISS-based retrieval and HuBERT features
- Performs real-time voice conversion with low latency during inference
- Supports pitch shifting and formant preservation for natural output
- Provides one-click training with built-in data preprocessing and augmentation
- Includes batch audio conversion for processing multiple files at once

## Architecture Overview
RVC combines a HuBERT encoder for extracting speaker-independent content features with a FAISS index for retrieving the closest matching voice embeddings from the target speaker. The retrieved features are blended with predicted features and fed into a neural vocoder based on the VITS architecture to synthesize the output waveform. This retrieval-augmented approach reduces training requirements while maintaining voice quality.

## Self-Hosting & Configuration
- Requires Python 3.8+ with PyTorch and CUDA for GPU acceleration
- Download pretrained base models (HuBERT and RMVPE) on first launch
- Configure training parameters via the web UI including sample rate, epochs, and batch size
- Supports both NVIDIA GPUs and CPU-only inference at reduced speed
- Logs and model checkpoints are saved to the local weights directory

## Key Features
- Minimal data requirement: train usable models from 10 minutes of audio
- Real-time voice conversion with adjustable pitch and index ratio
- Built-in RMVPE pitch extraction for improved accuracy over legacy methods
- Gradio-based web interface for training, inference, and model management
- Active community with extensive pretrained model ecosystem

## Comparison with Similar Tools
- **so-vits-svc** — Requires more training data and longer training times for comparable quality
- **DDSP-SVC** — Lighter weight but less natural output on complex voice timbres
- **OpenVoice** — Focuses on zero-shot cloning rather than fine-tuned per-speaker models
- **Bark** — Text-to-speech generation rather than voice-to-voice conversion

## FAQ
**Q: How much audio data do I need to train a model?**
A: A minimum of 10 minutes of clean speech is recommended, though 30+ minutes yields better results.

**Q: Can RVC run without a GPU?**
A: Yes, CPU inference is supported but significantly slower. Training on CPU is not practical.

**Q: Does RVC support real-time conversion?**
A: Yes, it supports real-time voice conversion with latency depending on hardware and buffer settings.

**Q: What audio formats are supported?**
A: WAV, MP3, FLAC, and other common formats are accepted. Audio is internally converted to WAV for processing.

## Sources
- https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
- https://docs.google.com/document/d/13ebnzmeEBc6uzYCMt-QVFQk-whVrK4zw8k7_Lw3Bv_A

---
Source: https://tokrepo.com/en/workflows/asset-a9007458
Author: Script Depot