Machine Translation Weekly 84: Order Agnostic Cross-Entropy

I tend to be a little biased against autoregressive models. The way they
operate: say exactly one subword, think for a while, and then say again exactly
one subword, just does not sound natural to me. Moreover, with current models,
a subword can be anything from a single character to a word as long as
“Ausgußreiniger”.
Non-autoregressive models generate everything in a single step. That does seem
to be really natural either, but at least they offer an interesting
alternative. Hopefully, one day, we can have something in-between these two
extremes. Because the day did not come yet, today, I am going to comment on a
paper that introduces an interesting loss function for non-autoregressive MT,
which might a small step in this direction. The title of the paper

 

 

To finish reading, please visit source site

Leave a Reply