# RVC — Retrieval-Based Voice Conversion Training & Inference > Train custom voice conversion models with as little as 10 minutes of audio data using retrieval-based techniques for natural-sounding results. ## Install Save as a script file and run: # RVC — Retrieval-Based Voice Conversion ## Quick Use ```bash git clone https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git cd Retrieval-based-Voice-Conversion-WebUI pip install -r requirements.txt python infer-web.py ``` ## Introduction RVC is an open-source voice conversion framework that uses retrieval-based techniques to produce high-quality voice cloning with minimal training data. It enables users to train custom voice models from as little as 10 minutes of audio and perform real-time inference through a Gradio web interface. ## What RVC Does - Trains voice conversion models from short audio clips using FAISS-based retrieval and HuBERT features - Performs real-time voice conversion with low latency during inference - Supports pitch shifting and formant preservation for natural output - Provides one-click training with built-in data preprocessing and augmentation - Includes batch audio conversion for processing multiple files at once ## Architecture Overview RVC combines a HuBERT encoder for extracting speaker-independent content features with a FAISS index for retrieving the closest matching voice embeddings from the target speaker. The retrieved features are blended with predicted features and fed into a neural vocoder based on the VITS architecture to synthesize the output waveform. This retrieval-augmented approach reduces training requirements while maintaining voice quality. ## Self-Hosting & Configuration - Requires Python 3.8+ with PyTorch and CUDA for GPU acceleration - Download pretrained base models (HuBERT and RMVPE) on first launch - Configure training parameters via the web UI including sample rate, epochs, and batch size - Supports both NVIDIA GPUs and CPU-only inference at reduced speed - Logs and model checkpoints are saved to the local weights directory ## Key Features - Minimal data requirement: train usable models from 10 minutes of audio - Real-time voice conversion with adjustable pitch and index ratio - Built-in RMVPE pitch extraction for improved accuracy over legacy methods - Gradio-based web interface for training, inference, and model management - Active community with extensive pretrained model ecosystem ## Comparison with Similar Tools - **so-vits-svc** — Requires more training data and longer training times for comparable quality - **DDSP-SVC** — Lighter weight but less natural output on complex voice timbres - **OpenVoice** — Focuses on zero-shot cloning rather than fine-tuned per-speaker models - **Bark** — Text-to-speech generation rather than voice-to-voice conversion ## FAQ **Q: How much audio data do I need to train a model?** A: A minimum of 10 minutes of clean speech is recommended, though 30+ minutes yields better results. **Q: Can RVC run without a GPU?** A: Yes, CPU inference is supported but significantly slower. Training on CPU is not practical. **Q: Does RVC support real-time conversion?** A: Yes, it supports real-time voice conversion with latency depending on hardware and buffer settings. **Q: What audio formats are supported?** A: WAV, MP3, FLAC, and other common formats are accepted. Audio is internally converted to WAV for processing. ## Sources - https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI - https://docs.google.com/document/d/13ebnzmeEBc6uzYCMt-QVFQk-whVrK4zw8k7_Lw3Bv_A --- Source: https://tokrepo.com/en/workflows/asset-a9007458 Author: Script Depot