Issue #64 – Neural Machine Translation with Byte-Level Subwords

13 Dec19 Issue #64 – Neural Machine Translation with Byte-Level Subwords Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic In order to limit vocabulary, most neural machine translation engines are based on subwords. In some settings, character-based systems are even better (see issue #60). However, rare characters in noisy data or character-based languages can unnecessarily take up vocabulary slots and limit its compactness. In this post we take a look at an alternative, proposed by Wang et al. (2019), […]