Block Sparse Matrices for Smaller and Faster Language Models
In previous blog posts we introduced sparse matrices and what they could do to improve neural networks. The basic assumption is that full dense layers are often overkill and can be pruned without a significant loss in precision. In some cases sparse linear layers can even improve precision or/and generalization. The main issue is that currently available code that supports sparse algebra computation is severely lacking efficiency. We are also still waiting for official PyTorch support. That’s why we ran […]
Read more