Machine Translation Weekly 56: Beam Search and Models’ Surprisal

Last year an EMNLP paper “On NMT Search Errors and Model Errors: Cat Got Your
Tongue?” (that I discussed in MT
Weekly 20) showed a
mindblowing property of neural machine translation models that the most
probable target sentence is not necessarily the best target sentence.

In NMT, we model the target sentence probably that is factorized using the
chain rule into conditional token probabilities. We can imagine the target
sentence generation like this: The model estimates the probability of the first
word given the source sentence. From this distribution, we pick one word. The
model then estimates the probability of the second word given the first word
and the source sentence. We select the second word from this distribution, and
so on…

Previously, we thought that exact inference

To finish reading, please visit source site

Machine Translation Weekly 56: Beam Search and Models’ Surprisal

Leave a Reply Cancel reply