Neural Machine Translation

Machine Translation Weekly 62: The EDITOR

Papers about new models for sequence-to-sequence modeling have always been my favorite genre. This week I will talk about a model called EDITOR that was introduced in a pre-print of a paper that will appear in the TACL journal with authors from the University of Maryland. The model is based on the Levenshtein Transformer, a partially non-autoregressive model for sequence-to-sequence learning. Autoregressive models generate the output left-to-right (or right-to-left), conditioning each step on the previously generated token. On the other […]

Read more

Automatic Standardization of Colloquial Persian

The Iranian Persian language has two varieties: standard and colloquial. Most natural language processing tools for Persian assume that the text is in standard form: this assumption is wrong in many real applications especially web content… This paper describes a simple and effective standardization approach based on sequence-to-sequence translation. We design an algorithm for generating artificial parallel colloquial-to-standard data for learning a sequence-to-sequence model. Moreover, we annotate a publicly available evaluation data consisting of 1912 sentences from a diverse set […]

Read more

Globetrotter: Unsupervised Multilingual Translation from Visual Alignment

Multi-language machine translation without parallel corpora is challenging because there is no explicit supervision between languages. Existing unsupervised methods typically rely on topological properties of the language representations… We introduce a framework that instead uses the visual modality to align multiple languages, using images as the bridge between them. We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations. Our language representations are trained jointly in one model with a […]

Read more

On Knowledge Distillation for Direct Speech Translation

Direct speech translation (ST) has shown to be a complex task requiring knowledge transfer from its sub-tasks: automatic speech recognition (ASR) and machine translation (MT). For MT, one of the most promising techniques to transfer knowledge is knowledge distillation… In this paper, we compare the different solutions to distill knowledge in a sequence-to-sequence task like ST. Moreover, we analyze eventual drawbacks of this approach and how to alleviate them maintaining the benefits in terms of translation quality. (read more) PDF

Read more

Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural Networks

The attention mechanism is a key component of the neural revolution in Natural Language Processing (NLP). As the size of attention-based models has been scaling with the available computational resources, a number of pruning techniques have been developed to detect and to exploit sparseness in such models in order to make them more efficient… The majority of such efforts have focused on looking for attention patterns and then hard-coding them to achieve sparseness, or pruning the weights of the attention […]

Read more

Machine Translation Weekly 61: Decoding and diversity

This week I will comment on a short paper from Carnegie Mellon University and Amazon that shows a simple analysis of the diversity of machine translation outputs. The title of the paper is Decoding and Diversity in Machine Translation and it will be presented at the Resistance AI Workshop at NeuRIPS 2020 (what a name for a workshop). The main thing that the paper shows that is the translation quality measured in terms of BLEU score strongly negatively correlates with […]

Read more

MK-SQuIT: Synthesizing Questions using Iterative Template-filling

The aim of this work is to create a framework for synthetically generating question/query pairs with as little human input as possible. These datasets can be used to train machine translation systems to convert natural language questions into queries, a useful tool that could allow for more natural access to database information… Existing methods of dataset generation require human input that scales linearly with the size of the dataset, resulting in small datasets. Aside from a short initial configuration task, […]

Read more

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs). MWEs include verbal MWEs (vMWEs) defined in the PARSEME shared task that have a verb as the head of the studied terms… The annotated vMWEs are also bilingually and multilingually aligned manually. The languages covered include English, Chinese, Polish, and German. Our original English corpus is taken from the PARSEME shared task in 2018. We performed machine translation of this source corpus followed […]

Read more

Incorporating a Local Translation Mechanism into Non-autoregressive Translation

In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among tar-get outputs. Specifically, for each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way… We further design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence. We integrate LAT into the conditional masked language model (CMLM; Ghazvininejad et al.,2019) […]

Read more

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling… The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na”{i}ve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue […]

Read more
1 8 9 10 11 12 13