Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

Tl;dr: This post explains how to use the specificities of the Connectionist
Temporal Classification (CTC) architecture in order to achieve very good
quality automatic speech recognition (ASR) even on arbitrarily long files or 
during live inference.

Wav2Vec2 is a popular pre-trained model for speech recognition.
Released in September 2020
by Meta AI Research, the novel architecture catalyzed progress in

To finish reading, please visit source site