Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

Facebook NLP Research

Abstract

In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing social media videos of 7 languages with training data 3K – 14K hours, we conduct large-scale controlled experimentation across each criterion using identical datasets and encoder model architecture. We find that RNN-T has consistent wins

 

 

To finish reading, please visit source site