Why don’t people use character-level MT? – One year later

In this post, I comment on our (i.e., myself, Helmut Schmid and Alex Fraser)
year-old paper “Why don’t people use character-level machine
,” published in
Findings of ACL
. Here, I
will (besides briefly summarizing the paper’s main message) mostly comment on
what I learned while working on the one-year-later perspective, focusing more
on what I would do differently now. If you are interested in the exact research
content, read the paper or
watch a 5-minute

Paper TL;DR

Doing character-level MT is mostly not a good idea. The systems are slow, and
they do not have better translation quality. Stuff that used to work on par
with subwords when using RNNs does not work with Transformers, and
architectures that work well for encoder-only models,



To finish reading, please visit source site

Leave a Reply