Machine Translation Weekly 84: Order Agnostic Cross-Entropy
I tend to be a little biased against autoregressive models. The way they operate: say exactly one subword, think for a while, and then say again exactly one subword, just does not sound natural to me. Moreover, with current models, a subword can be anything from a single character to a word as long as “Ausgußreiniger”. Non-autoregressive models generate everything in a single step. That does seem to be really natural either, but at least they offer an interesting alternative. […]
Read more