A complete suite for training sequence-to-sequence models in PyTorch

This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train and infer using them.
Using this code you can train:
- Neural-machine-translation (NMT) models
- Language models
- Image to caption generation
- Skip-thought sentence representations
- And moreā¦
Installation
git clone --recursive https://github.com/eladhoffer/seq2seq.pytorch
cd seq2seq.pytorch; python setup.py develop
Models
Models currently available:
Datasets
Datasets currently available:
All datasets can be tokenized using 3 available segmentation methods:
- Character based segmentation
- Word based segmentation
- Byte-pair-encoding (BPE) as suggested by bpe with selectable number of tokens.
After choosing a tokenization method,