Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Nicolas Patry's avatar
Tl;dr: This post explains how to use the specificities of the Connectionist
Temporal Classification (CTC) architecture in order to achieve very good
quality automatic speech recognition (ASR) even on arbitrarily long files or 
during live inference.

Wav2Vec2 is a popular pre-trained model for speech recognition.
Released in September 2020
by Meta AI Research, the novel architecture catalyzed progress in

 

 

 

To finish reading, please visit source site