A Pytorch Implementation of the Transformer: Attention Is All You Need
Our implementation is largely based on Tensorflow implementation
Requirements
Why This Project?
I’m a freshman of pytorch. So I tried to implement some projects by pytorch. Recently, I read the paper Attention is all you need and impressed by the idea. So that’s it. I got similar result compared with the original tensorflow implementation.
Differences with the original paper
I don’t intend to replicate the paper exactly. Rather, I aim to implement the main ideas in the paper and verify them in a SIMPLE and QUICK way. In this respect, some parts in my code are different than those in the paper. Among them are
- I used the IWSLT 2016 de-en