March 13, 2026 huggingface

Ettin Suite: SoTA Paired Encoders and Decoders

What would happen if you took the ModernBERT recipe and applied it to a decoder-only model? Turns out, a state-of-the-art decoder language model that beats Llama 3.2 1B and SmolLM2!

We introduce a new open-data training recipe to reproduce the encoder-only ModernBERT model (and actually beat it!). We then apply the exact same recipe to decoder-only models. For the first time, we have two state-of-the-art models trained in the same setup but with two different training objectives: masked language modeling (MLM), and causal language modeling (CLM).

This blog post introduces Ettin, the first suite of SoTA paired

To finish reading, please visit source site

Categories
Categories

Search for:

Recent Posts

Introducing Pull Requests and Discussions 🥳

Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers

Deep Q-Learning with Space Invaders

The Annotated Diffusion Model

Director of Machine Learning Insights [Part 3: Finance Edition]

Tags
Attention blogathon Calculus Command-line Tools Data Preparation data science data visualization Deep Learning Deep Learning for Computer Vision Deep Learning for Natural Language Processing Deep Learning for Time Series Deep Learning Performance Deep Learning with PyTorch Ensemble Learning Generative Adversarial Networks Imbalanced Classification Linear Algebra Long Short-Term Memory Networks machine learning Machine Learning Algorithms Machine Learning Process Machine Learning Resources machine translation Matplotlib Natural language processing Natural Language Processing & Speech Neural MT nlp NMT opencv Optimization pandas Probability python Python for Machine Learning Python Machine Learning Resources R Machine Learning scikit-learn sentiment analysis Start Machine Learning Statistics Time Series Weka Machine Learning XGBoost

Categories
Categories

Archives
Archives

Powered by WordPress and Rubine.