Zero-Shot Voice Style Transfer with Only Autoencoder Loss

AutoVC

This is an unofficial implementation of AutoVC based on the official one. The D-Vector and vocoder are from yistlin/dvector and yistLin/universal-vocoder respectively.

This implementation supports torch.jit, so the full model can be loaded with simply one line:

model = torch.jit.load(model_path)

Pre-trained models are available here.

Preprocessing

python preprocess.py    [--seg_len seg] [--n_workers workers]
  • data_dir: The directory of speakers.
  • save_dir: The directory to save the processed files.
  • encoder_path: The path of pre-trained D-Vector.
  • seg: The length of segments for training.
  • workers: The number of workers for preprocessing.

Training

python train.py    [--n_steps steps] [--save_steps save] [--log_steps log] [--batch_size batch] [--seg_len seg]