Towards Fast, Controllable and Lightweight Text-to-Speech synthesis
FCL-Taco2 Block diagram of FCL-taco2, where the decoder generates mel-spectrograms in AR mode within each phoneme and is shared for all phonemes. Training and inference scripts for FCL-taco2 Environment python 3.6.10 torch 1.3.1 chainer 6.0.0 espnet 8.0.0 apex 0.1 numpy 1.19.1 kaldiio 2.15.1 librosa 0.8.0 Training and inference: Step1. Data preparation & preprocessing Download LJSpeech Unpack downloaded LJSpeech-1.1.tar.bz2 to /xx/LJSpeech-1.1 Obtain the forced alignment information by using Montreal forced aligner tool. Or you can download our alignment results, then unpack […]
Read more