Adding Benchmaxxer Repellant to the Open ASR Leaderboard
“When a measure becomes a target, it ceases to be a good measure.” (Goodhart’s Law) TLDR: Appen Inc. and DataoceanAI have provided high-quality English ASR datasets covering scripted and conversational speech over multiple accents. To prevent potential risks of benchmaxxing or test-set contamination, we will keep these datasets private for a high-quality measure of performance on multiple tasks. We’re not updating the average WER at this time: by default, the leaderboard’s Average WER remains computed on public datasets only. You […]
Read more