Zero-Shot Voice Style Transfer with Only Autoencoder Loss

AutoVC

This is an unofficial implementation of AutoVC based on the official one. The D-Vector and vocoder are from yistlin/dvector and yistLin/universal-vocoder respectively.

This implementation supports torch.jit, so the full model can be loaded with simply one line:

model = torch.jit.load(model_path)

Pre-trained models are available here.

Preprocessing

python preprocess.py    [--seg_len seg] [--n_workers workers]

data_dir: The directory of speakers.
save_dir: The directory to save the processed files.
encoder_path: The path of pre-trained D-Vector.
seg: The length of segments for training.
workers: The number of workers for preprocessing.

Training

python train.py    [--n_steps steps] [--save_steps save] [--log_steps log] [--batch_size batch] [--seg_len seg]

config_path: The config file of model hyperparameters.
data_dir: The directory of preprocessed data.

To finish reading, please visit source site