Feedback Transformer and Expire-Span with python
This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequential modeling with Transformer-like architectures. Requirements You will need a CUDA-enabled GPU to run the code. Setup Run the following: pip install -r requirements.txt Feedback Transformer Introduced in Addressing Some Limitations of Transformers with Feedback Memory. Running Experiments from the Paper enwik8 Model Params Valid Test Feedback Transformer 77M 0.984 0.962 Numbers are Bits-Per-Character bash experiments/feedback/enwik8.sh Algorithmic Model 3 Variable 5 Variable […]
Read more