Issue #29 – Improving Robustness in Neural MT

28 Mar19

Issue #29 – Improving Robustness in Neural MT

Author: Dr. Rohit Gupta, Sr. Machine Translation Scientist @ Iconic

Despite the high level of performance in current Neural MT engines, there remains a significant issue with robustness when it comes to unexpected, noisy input. When the input is not clean, the quality of the output drops drastically. In this issue, we will take a look at the impact of various types of ‘noise’ on the quality and we will discuss techniques proposed by Vaibhav et al. (2019) to improve the robustness of an NMT system.

Noises and their impact on Translation Quality

There can be several different types of noise, or errors, in any piece of text. It really depends on how it was written. These include, but are not limited to:

  • spelling or typographical errors (receive vs recieve)
  • word omission
  • word insertion
  • repetitions
  • grammatical errors (a ton of vs a tons of)
  • spoken language (want to vs wanna)
  • slang (to be honest vs tbh)
  • proper nouns
  • dialects
  • jargon
  • emojis
  • obfuscated profanities (f*ing)
  • OCR related errors ([4] vs 14], study vs st ud y)
  • inconsistent capitalisation (change vs chaNGE)

Vaibhav et al. (2019) observed the impact of four types of errors and their impact on
To finish reading, please visit source site

Leave a Reply