Issue #26 – Context and Copying in Neural MT

21 Feb19 Issue #26 – Context and Copying in Neural MT Author: Raj Patel, Machine Translation Scientist @ Iconic When translating from one language to another, certain words and tokens need to be copied, and not translated, per se, in the target sentence. This includes things like proper nouns, names, numbers, and ‘unknown’ tokens. We want these to appear in the translation just as they were in the original text. Neural MT systems with subword vocabulary are capable of copying […]

Read more

Issue #24 – Exploring language models for Neural MT

07 Feb19 Issue #24 – Exploring language models for Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic Monolingual language models were a critical part of Phrase-based Statistical Machine Translation systems. They are also used in unsupervised Neural MT systems (unsupervised means that no parallel data is available to supervise training, in other words only monolingual data is used). However, they are not used in standard supervised Neural MT engines and training language models have disappeared from common […]

Read more

Issue #23 – Unbiased Neural MT

01 Feb19 Issue #23 – Unbiased Neural MT Author: Raj Patel, Machine Translation Scientist @ Iconic A recent topic of conversation and interest in the area of Neural MT – and Artificial Intelligence in general – is gender bias. Neural models are trained using large text corpora which inherently contain social biases and stereotypes, and as a consequence, translation models inherit these biases. In this article, we’ll try to understand how gender bias affects the translation quality and discuss a […]

Read more

Issue #22 – Mixture Models in Neural MT

24 Jan19 Issue #22 – Mixture Models in Neural MT Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic It goes without saying that Neural Machine Translation has become state of the art in MT. However, one challenge we still face is developing a single general MT system which works well across a variety of different input types. As we know from long-standing research into domain adaptation, a system trained on patent data doesn’t perform well when translating software documentation […]

Read more

Issue #21 – Revisiting Data Filtering for Neural MT

17 Jan19 Issue #21 – Revisiting Data Filtering for Neural MT   Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic The Neural MT Weekly is back for 2019 after a short break over the holidays! 2018 was a very exciting year for machine translation, as documented over the first 20 articles in this series. What was striking was the pace of development, even in the 6 months since we starting publishing these articles. This was illustrated by the fact […]

Read more

Issue #20 – Dynamic Vocabulary in Neural MT

06 Dec18 Issue #20 – Dynamic Vocabulary in Neural MT As has been covered a number of times in this series, Neural MT requires good data for training, and acquiring such data for new languages can be costly and not always feasible. One approach in Neural MT literature for improving translation quality for low-resource language is transfer-learning. A common practice is to reuse the model parameters (encoder, decoder, and word embeddings) of a high resource language and fine tune it […]

Read more

Issue #19 – Adaptive Neural MT

29 Nov18 Issue #19 – Adaptive Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic Neural Machine Translation is known to be particularly poor at translating out-of-domain data. That is, an engine trained on generic data will be much worse at translating medical documents than an engine trained on medical data. It is much more sensitive to such differences than, say, Statistical MT. This problem is partially solved by domain adaptation techniques, which we covered in Issue #9 […]

Read more

Issue #17 – Speeding up Neural MT

15 Nov18 Issue #17 – Speeding up Neural MT Author: Raj Nath Patel, Machine Translation Scientist @ Iconic For all the benefits Neural MT has brought in terms of translation quality, producing output quickly and efficiently is still a challenge for developers. All things being equal, Neural MT is slower than its statistical counterpart. This is particularly the case when running translation on standard processors (CPUs) as opposed to faster, more powerful (but also more expensive) graphics processors (GPUs), which is […]

Read more

Issue #16 – Revisiting synthetic training data for Neural MT

08 Nov18 Issue #16 – Revisiting synthetic training data for Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic In a previous guest post in this series, Prof. Andy Way explained how to create training data for Neural MT through back-translation. This technique involves translating monolingual data in the target language into the source language to obtain a parallel corpus of “synthetic” source and “authentic” target data – so called back-translation. Andy reported interesting findings whereby, with a few million […]

Read more

Issue #15 – Document-Level Neural MT

01 Nov18 Issue #15 – Document-Level Neural MT Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic In this week’s post, we take a look at document-level neural machine translation. Most, if not all existing approaches to machine translation operate on the sentence level. That is to say, when translating a document, it is actually split up into individual sentences or segments, and they are processed independently of each other. With document-level Neural MT, as the name suggests, we are going beyond […]

Read more
1 848 849 850 851 852