Issue #73 – Mixed Multi-Head Self-Attention for Neural MT

12 Mar20 Issue #73 – Mixed Multi-Head Self-Attention for Neural MT Author: Dr. Patrik Lambert, Machine Translation Scientist @ Iconic Self-attention is a key component of the Transformer, a state-of-the-art neural machine translation architecture. In the Transformer, self-attention is divided into multiple heads to allow the system to independently attend to information from different representation subspaces. Recently it has been shown that some redundancy occurs in the multiple heads. In this post, we take a look at approaches which ensure […]