DeLighT: Very Deep and Light-weight Transformers
DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (ICLR’20) and DeLighT (preprint). Overview In this repository, we share the source code of our paper DeLight, that delivers similar or better performance thantransformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1)within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks usingblock-wise scaling, that allows for shallower and narrower DeLighT […]
Read more