Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
This repository contains the code in both PyTorch and TensorFlow for our paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov (*: equal contribution) Preprint 2018 TensorFlow The source code is in the tf/ folder, supporting (1) single-node multi-gpu training, and (2) multi-host TPU training. Besides the source code, we also provide pretrained “TensorFlow” models with state-of-the-art (SoTA) performances reported in the paper. Please refer to tf/README.md […]
Read more