Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction
This is a fork of Fairseq(-py) with implementations of the following models:
Pervasive Attention – 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
An NMT models with two-dimensional convolutions to jointly encode the source and the target sequences.
Pervasive Attention also provides an extensive decoding grid that we leverage to efficiently train wait-k models.
See README.
Efficient Wait-k Models for Simultaneous Machine Translation
Transformer Wait-k models (Ma et al., 2019) with unidirectional encoders and with joint training of multiple wait-k paths.
See README.
- PyTorch version >= 1.4.0
- Python version >= 3.6
- For training new models, you’ll also need an NVIDIA GPU and NCCL
Installing Fairseq