Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction
This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention – 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences. Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models. See README. Efficient Wait-k Models for Simultaneous Machine Translation Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths. […]
Read more