EMO: Pretraining mixture of experts for emergent modularity
đź§ Models: https://huggingface.co/collections/allenai/emo | đź“„ Tech report: https://allenai.org/papers/emo | đź’» Code: https://github.com/allenai/EMO | 📊 Visualization: https://emovisualization.netlify.app/ Today we’re releasing EMO, a new mixture-of-experts (MoE) model pretrained end-to-end so that modular structure emerges directly from the data without relying on human-defined priors. EMO lets you use a small subset of its experts – just 12.5% of the total – for a given task while keeping near full-model performance, and still works as a strong general-purpose model when all experts are used […]
Read more