Issue #68 – Incorporating BERT in Neural MT

07 Feb20

Issue #68 – Incorporating BERT in Neural MT

Author: Raj Patel, Machine Translation Scientist @ Iconic

BERT (Bidirectional Encoder Representations from Transformers) has shown impressive results in various Natural Language Processing (NLP) tasks. However, how to effectively apply BERT in Neural MT has not been fully explored. In general, BERT is used as fine-tuning for downstream NLP tasks. For Neural MT, a pre-trained BERT model is used to initialise the encoder in an encoder-decoder architecture. In this post we will discuss an improved technique incorporating BERT in Neural MT aka BERT-fused model proposed by Zhu et. al., (2020)

architecture of BERT fused model

BERT-fused model

Zhu et. al., (2020) have proposed a modified encoder-decoder architecture in which they first use BERT to extract representation for the input sequence, and then this representation is fused in each layer of the encoder and decoder of the NMT model using cross attention, as depicted in Figure 1.  In both Bert-enc, and Bert-dec attention, the Key (K), and Value (V) are created using BERT representation. For
To finish reading, please visit source site

Leave a Reply