Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL

TL;DR, because you have models to train and we respect that: Async RL has a dirty secret: every step, the trainer has to ship the whole model to the inference engine. For a 7B in bf16 that is 14 GB. For a frontier 1T model checkpoint that is on the order of a terabyte. Per step. It turns out you do not have to. Between two consecutive RL optimizer steps, roughly 99% of bf16 weights are bit-identical (and never less […]

Read more

Reachy Mini goes fully local

After building your Reachy Mini, you’ll install the conversation app and start talking to it. Until now, you had to send your audio to a server. But not anymore. Today we’ll walk you through running the whole stack locally. This stack is powered by speech-to-speech, our cascaded VAD → STT → LLM → TTS pipeline that exposes a Realtime API-compatible /v1/realtime WebSocket. Once you launch the backend, point the robot at it from the UI. Cascades are the most flexible […]

Read more

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

Artificial Analysis and IBM Software Innovation Lab are launching ITBench-AA, the first in a new series of benchmarks evaluating models on agentic enterprise IT tasks, starting with Site Reliability Engineering tasks where frontier models score below 50% ITBench-AA’s SRE tasks benchmark model performance on Kubernetes incident response, where models and agents must diagnose live systems by reading logs, tracing dependencies, and identifying root-cause entities across complex infrastructure. The underlying ITBench dataset has been developed by IBM, leveraging deep expertise in […]

Read more

Profiling in PyTorch (Part 1): A Beginner’s Guide to torch.profiler

What you cannot profile, you cannot optimize. Whether you are trying to squeeze more tokens per second out of a Large Language Model (LLM), shave milliseconds off inference, or just understand why your training loop runs slower than the spec sheet promises, the path eventually runs through profiling. The catch is that profiling has a steep on-ramp. The traces are dense walls of colored rectangles. The events carry intimidating names. Most tutorials assume you can already read them. So even […]

Read more

The Open Agent Leaderboard

How good are general purpose AI agents? We built an open evaluation framework to find out. Most evaluations in AI report a simple result: what score each model got on which benchmarking task. When you deploy an agent, you’re not just choosing a model. You’re choosing a full system: what tools the agent can use, how it plans its steps, what it remembers between actions, how it recovers when something goes wrong. Change any of those and the same model […]

Read more

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddleOCR 3.5 brings OCR and document parsing tasks closer to the Hugging Face ecosystem. With this release, supported PaddleOCR models can run with Hugging Face Transformers as an inference backend by setting: engine=”transformers” PaddleOCR continues to provide OCR model series such as PP-OCRv5 and document parsing model series such as PaddleOCR-VL 1.5, while Transformers becomes one of the supported backends for running them. Try the live demo on Hugging Face Spaces: https://huggingface.co/spaces/PaddlePaddle/paddleocr-3.5-transformers-demo What changed? PaddleOCR 3.5 introduces a more flexible […]

Read more

Introducing the Ettin Reranker Family

Today I’m releasing six new Sentence Transformers CrossEncoder rerankers, state-of-the-art at their respective sizes, built on top of the Ettin ModernBERT encoders, together with the data and full training recipe that produced them: The models were trained with a distillation recipe: pointwise MSE on mixedbread-ai/mxbai-rerank-large-v2 scores over cross-encoder/ettin-reranker-v1-data, which is a subset of lightonai/embeddings-pre-training mixed with a reranked subset of lightonai/embeddings-fine-tuning. Our six rerankers paired with google/embeddinggemma-300m on MTEB(eng, v2) Retrieval. See Results for five more embedder pairings. If you’re […]

Read more

OlmoEarth v1.1: A more efficient family of Earth observation models

🧠 Models: https://huggingface.co/collections/allenai/olmoearth | 📄 Tech Report: https://allenai.org/papers/olmoearth_v1_1 | 💻 Code: https://github.com/allenai/olmoearth_pretrain We released OlmoEarth (v1) in November 2025. Since then, partners have applied it across a wide range of tasks, from tracking mangrove change to classifying drivers of forest loss to producing country-scale crop-type maps in days, scaling deployments to national, continental, and global areas. Every release moves us closer to our mission: bringing state-of-the-art AI to organizations and communities working to protect people and our planet. When OlmoEarth […]

Read more

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

Large language models (LLMs) have become the default interface for code generation, math problem solving, summarization, document understanding, and many other developer workflows. Under the hood, though, many LLMs still generate text the same way: one token at a time, and each token depends on the tokens that appeared before it. As such, these models are called autoregressive, since they consume their own outputs. That autoregressive (AR) approach has been remarkably successful. It is stable to train, simple to serve, […]

Read more
1 2 3 4 78