Machine Translation Weekly 89: BPE and Memorization
Similar to last week, I will discuss a paper about input segmentation. The paper is not directly about machine translation or multilinguality but brings interesting insights for Transformer models in general. The title of the paper is How BPE affects memorization in Transformers, it has authors from Facebook AI and the preprint appeared on Thursday on arXiv. The paper presents a series of experiments with Transformer models for natural language inferences and different sizes of BPE-based vocabulary by which they […]
Read more