Issue #49 – Representation Bottleneck in Neural MT

08 Aug19

Issue #49 – Representation Bottleneck in Neural MT

Author: Raj Patel, Machine Translation Scientist @ Iconic

In Neural MT, lexical features are fed to the network as lexical representations (aka word embeddings) to the first layer of the encoder and refined as propagate through the deep network of hidden layers. In this post we’ll try to understand how the lexical representation is affected as it goes deeper in the network and investigate if it affects the translation quality.

Representation Bottleneck

Recently, several studies have investigated the nature of language features encoded within individual layers of the neural translation model. Belinkov et al. (2018) reported that in recurrent architectures, different layers prioritise different information types. As such, lower layers are suggested to represent morphological and syntactic information, whereas the semantic features are concentrated towards the top of the layer stack. In an ideal scenario, the information encoded in various layers should be transported to the decoder whereas in practice only the last layer is used. 

Along the same line, Emelin et al. (2019) studied the transformer architecture. In the transformer model, the information proceeds in a strictly sequential manner, where
To finish reading, please visit source site

Leave a Reply