Incorporating a Local Translation Mechanism into Non-autoregressive Translation

In this work, we introduce a novel local autoregressive translation (LAT) mechanism into non-autoregressive translation (NAT) models so as to capture local dependencies among tar-get outputs. Specifically, for each target decoding position, instead of only one token, we predict a short sequence of tokens in an autoregressive way… We further design an efficient merging algorithm to align and merge the out-put pieces into one final output sequence. We integrate LAT into the conditional masked language model (CMLM; Ghazvininejad et al.,2019) […]

Read more

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling… The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na”{i}ve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than alternatives. We address this issue […]

Read more

Machine Translation of Novels in the Age of Transformer

In this chapter we build a machine translation (MT) system tailored to the literary domain, specifically to novels, based on the state-of-the-art architecture in neural MT (NMT), the Transformer (Vaswani et al., 2017), for the translation direction English-to-Catalan. Subsequently, we assess to what extent such a system can be useful by evaluating its translations, by comparing this MT system against three other systems (two domain-specific systems under the recurrent and phrase-based paradigms and a popular generic on-line system) on three […]

Read more

Learning to Use Future Information in Simultaneous Translation

Simultaneous neural machine translation (briefly, NMT) has attracted much attention recently. In contrast to standard NMT, where the NMT system can access the full input sentence, simultaneous NMT is a prefix-to-prefix problem, where the system can only utilize the prefix of the input sentence and thus more uncertainty and difficulty are introduced to decoding… Wait-k inference is a simple yet effective strategy for simultaneous NMT, where the decoder generates the output sequence $k$ words behind the input words. For wait-k […]

Read more

Adversarial machine learning and instrumental variables for flexible causal modeling

We are going through a new shift in machine learning (ML), where ML models are increasingly being used to automate decision-making in a multitude of domains: what personalized treatment should be administered to a patient, what discount should be offered to an online customer, and other important decisions that can greatly impact people’s lives. The machine learning revolution was primarily driven by problems that are distant from such decision-making scenarios. The first scenarios include predicting what an image depicts, predicting […]

Read more
1 17 18 19