Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers
Tl;dr: This post explains how to use the specificities of the Connectionist
Temporal Classification (CTC) architecture in order to achieve very good
quality automatic speech recognition (ASR) even on arbitrarily long files or
during live inference.
Wav2Vec2 is a popular pre-trained model for speech recognition.
Released in September 2020
by Meta AI Research, the novel architecture catalyzed progress in