Dion: the distributed orthonormal update revolution is here

Training AI models requires choosing an optimizer and for nearly a decade, Adam( (opens in new tab)–W) (opens in new tab) has been the optimizer of choice. Given that durability and success, it was