Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic

Guides have aided humanity throughout history. Prehistoric civilizations understood that the sun and the moon could be used to navigate vast distances on land and the high seas. Over time, various journeys facilitated the production of maps for better planning and faster travel time to repeat destinations. Centuries later, the introduction of the compass enabled seagoers to achieve greater accuracy in seeking unexplored destinations. And today, GPS navigation apps guide our every journey. In today’s world of agentic AI, AI […]

Read more

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code. The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments. It is released under the Apache 2.0 license. Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference. Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2 For architecture details, training […]

Read more

Holo3.1: Fast & Local Computer Use Agents

Last March, we released Holo3, our state-of-the-art computer-use model. Adoption was immediate. Developers, enterprises, and partners started deploying Holo3 across a wide range of workflows, from browser automation and business software to internal tools and desktop applications. As adoption grew, we realized performance alone was no longer enough. Users want to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks. They want deployment flexibility, from cloud inference to fully local execution on […]

Read more

Adding MCP Tools to Reachy Mini

Reachy Mini no longer has to look out the window to tell you the weather The Reachy Mini conversation app can now use tools hosted in public Hugging Face Spaces, called over MCP. You can give your robot a new ability, like checking the weather or searching the web, by adding a Space from the Hub instead of editing the app. The tool keeps running in the Space itself, so no code is downloaded onto your machine. And you can […]

Read more

Direct Preference Optimization Beyond Chatbots

In April, we released DharmaOCR, our specialized structured OCR model (available on Hugging Face) along with a paper detailing the methodology behind it and a benchmark demonstrating its superior quality and cost efficiency. The paper benchmarked leading vision-language model families - both open-source and commercial - on a structured document extraction task: OCR on Brazilian Portuguese text. Among the reported metrics was text degeneration rate: the frequency with which a model produces a repetition loop instead of a transcription. Across the tested open-source families, […]

Read more

Designing the hf CLI as an agent-optimized way to work with the Hub

hf is the official command-line entrypoint to the Hugging Face Hub. Anything you can do on the Hub from the Python SDK, you can do from your terminal: download and upload models, datasets and Spaces; create and manage repos, branches, tags and pull requests; run Jobs on HF infrastructure; manage Buckets, Collections, webhooks and Inference Endpoints. The hf CLI has been primarily built for our users over the years. But it’s now increasingly used by coding agents: Claude Code, Codex, […]

Read more

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

Introduction Voice agent failures are often highly domain-specific. A system that flawlessly processes alphanumeric confirmation codes in flight re-booking transactions might stumble when handling complex policies in HR systems. Different domains test an agent’s ability to adapt to different vocabulary, workflow complexities and user expectations. So with this release, EVA-Bench expands from one enterprise domain to three: Airline Customer Service Management (CSM), Enterprise IT Service Management (ITSM), and Healthcare HR Service Delivery (HRSD). Together they span 213 evaluation scenarios across […]

Read more

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

The last two years have seen NVIDIA’s content safety stack grow from a focused English text classifier into a family of specialized models—each extending coverage to new modalities, languages, and inference modes. Nemotron 3 Content Safety, released in March 2026, combined multimodal and multilingual capabilities for the first time in a single 4B-parameter model. Today, we are releasing Nemotron 3.5 Content Safety, which completes that arc: a single model that unifies multimodal input, multilingual reach, custom enterprise policy enforcement, and […]

Read more

Amazing Digital Dentures (a failed project)

So my idea was simple and a bit complicated at the same time, Have you guys watched The Amazing Digital Circus? it’s an animated show which features an AI pair of dentures named caine who lives in a virtual circus with some digital clones of real human beings and he creates and sends them on an adventures everyday, so my project was inspired by that. A digital pet that sends you on adventures that may be useful to your   […]

Read more

How I grade 200 exams every year

Three years ago, I took the introductory machine learning course over from Milan Straka, and one of the problems I had to deal with was: how do I grade 250 written exams without it consuming my entire life? (Part of the answer is of course ask colleagues for help, but this blog post is about technical stuff that is making it easier.) Milan had already established one good constraint: there is a fixed public list of exam questions that students […]

Read more
1 2 3 1,043