A Python scripts for a speech processing pipeline with Voice Activity Detection (VAD)

Python scripts for a speech processing pipeline with Voice Activity Detection (VAD), Spoken Language Identification (SLI), and Automatic Speech Recognition (ASR). Our use case involves using VAD to detect time regions in a language documentation recording where someone is speaking, then using SLI to classify each region as either English (eng) or Muruwari (zmu), and then using an English ASR model to transcribe regions detected as English. This pipeline outputs an ELAN .eaf file with the following tier structure (_vad, […]

Read more

Voice Based Personal Assistant using natural language processing

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing. The features of the assistant are that it can open folders on the voice commands of the user, open files in that folder and also find words in that file and create new file containing pages that had the word user was looking for in the file.We have made this project on python using NLP and libraries like […]

Read more

A package that assists in creating Voice Applications for Magenta Voice Platform

Magenta Voice Skill SDK for Python is a package that assists in creating Voice Applications for Magenta Voice Platform. About This is a reworked stack with explicit async/await concurrency and based on FastAPI ASGI framework. Old stable (Bottle/Gevent) 0.xx branch Installation Runtime Runtime installation: python -m pip install skill-sdk. Runtime (full) Runtime installation with Prometheus metrics exporter and distributed tracing adapter: python -m pip install skill-sdk[all]. Development Development installation: python -m pip install skill-sdk[dev]. Quickstart To bootstrap a new project, […]

Read more

A Diverse and Non-parallel Framework for Natural-Sounding Voice Conversion

Yinghao Aaron Li, Ali Zare, Nima Mesgarani We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain […]

Read more

Zero-Shot Voice Style Transfer with Only Autoencoder Loss

AutoVC This is an unofficial implementation of AutoVC based on the official one. The D-Vector and vocoder are from yistlin/dvector and yistLin/universal-vocoder respectively. This implementation supports torch.jit, so the full model can be loaded with simply one line: model = torch.jit.load(model_path) Pre-trained models are available here. Preprocessing python preprocess.py [–seg_len seg] [–n_workers workers] data_dir: The directory of speakers. save_dir: The directory to save the processed files. encoder_path: The path of pre-trained D-Vector. seg: The length of segments for training. workers: […]

Read more

Unofficial PyTorch implementation of Google AI’s VoiceFilter system

Hi everyone! It’s Seung-won from MINDs Lab, Inc. It’s been a long time since I’ve released this open-source, and I didn’t expect this repository to grab such a great amount of attention for a long time. I would like to thank everyone for giving such attention, and also Mr. Quan Wang (the first author of the VoiceFilter paper) for referring this project in his paper. Actually, this project was done by me when it was only 3 months after I […]

Read more