Introduction to Hugging Face’s Transformers v4.3.0 and its First Automatic Speech Recognition Model – Wav2Vec2

Overview

  • Hugging Face has released Transformers v4.3.0 and it introduces the first Automatic Speech Recognition model to the library: Wav2Vec2
  • Using one hour of labeled data, Wav2Vec2 outperforms the previous state of the art on the 100-hour subset while using 100 times less labeled data
  • Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data Wav2Vec2 achieves 4.8/8.2 WER
  • Understand Wav2Vec2 implementation using transformers library on audio to text generation

 

Introduction

Transformers has been a driving point for breakthrough developments in the Audio and Speech processing domain. And Hugging Face has no plans to stop its growing applications. Hugging Face just dropped the State-of-the-art Natural Language Processing library Transformers v4.30 and it has extended its reach to Speech Recognition by adding one of the leading Automatic Speech Recognition models by Facebook called the Wav2Vec2.

Transformers v4

Leave a Reply