Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code.
The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference.
Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments.
It is released under the Apache 2.0 license.
Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference.
Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2
For architecture details, training setup, benchmarks, and evaluation methodology, read the full technical report: https://arxiv.org/pdf/2605.31268

Today we’re releasing Mellum2, an open Mixture-of-Experts model optimized for low-latency text-and-code workloads.
Mellum originally started as a code completion model. With Mellum2, we extend that foundation to a broader set of natural language and software engineering tasks while keeping the model focused on efficient inference and deployability.
Modern AI systems increasingly rely on multiple model calls: routing, retrieval, summarization, planning, validation, and tool use.

To finish reading, please visit source site