March 13, 2026 huggingface

Mixture of Experts (MoEs) in Transformers

Over the past few years, scaling dense language models has driven most progress in LLMs. From early models like the original ULMFiT (~30M parameters) or GPT-2 (1.5B parameters, which at the time was considered “too dangerous to release” 🧌), and eventually to today’s hundred-billion–parameter systems, the recipe was simple: More data + more parameters gives better performance. Scaling laws reinforced this trend, but dense scaling has practical limits: Training becomes increasingly expensive. Inference latency grows. Deployment requires significant memory and […]

March 13, 2026 huggingface

PRX Part 3 — Training a Text-to-Image Model in 24h!

Welcome back 👋 In the last two posts (Part 1 and Part 2), we explored a wide range of architectural and training tricks for diffusion models. We tried to evaluate each idea in isolation, measuring throughput, convergence speed, and final image quality, and tried to understand what actually moves the needle. In this post, we want to answer a much more practical question: What happens when we combine all the tricks that worked? Instead of optimizing one dimension at a […]

March 13, 2026 huggingface

Introducing Modular Diffusers – Composable Building Blocks for Diffusion Pipelines

Modular Diffusers introduces a new way to build diffusion pipelines by composing reusable blocks. Instead of writing entire pipelines from scratch, you can mix and match blocks to create workflows tailored to your needs! This complements the existing DiffusionPipeline class with a more flexible, composable alternative. In this post, we’ll walk through how Modular Diffusers works — from the familiar API to run a modular pipeline, to building fully custom blocks and composing them into your own workflow. We’ll also […]

March 13, 2026 huggingface

Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations

Authors: Enzo Ruedas, Tess Boivin Recent advances in Large Language Models have enabled the transition from text-only reasoning to multimodal systems. First, with the integration of visual perception in Vision–Language Models (VLMs), and more recently with the generation of robot actions in Vision–Language–Action (VLA) models. Deploying these models on embedded robotic platforms remains a challenge

March 13, 2026 huggingface

LeRobot v0.5.0: Scaling Every Dimension

With over 200 merged PRs and over 50 new contributors since v0.4.0, LeRobot v0.5.0 is our biggest release yet — expanding in every direction at once. More robots (including our first humanoid), more policies (including the comeback of autoregressive VLAs), faster datasets, simulation environments you can load straight from the Hub, and a modernized codebase running on Python 3.12 and Transformers v5. Whether you’re training policies in simulation or deploying them on real hardware, v0.5.0 has something for you. TL;DR […]

March 13, 2026 huggingface

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Training large language models on long sequences has become essential for building capable AI systems. As models are increasingly used for tasks like document analysis, code understanding, complex reasoning, and RAG workloads, the need to process sequences of hundreds

March 13, 2026 huggingface

Granite 4.0 1B Speech: Compact, Multilingual, and Built for the Edge

We’re excited to share Granite 4.0 1B Speech, the latest addition to IBM’s Granite Speech collection. Designed for enterprise applications on resource-constrained devices, Granite 4.0 1B Speech is a compact speech-language model built for

March 13, 2026 huggingface

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

TL;DR — For those of you who don’t have time to read 5,000 words about async RL plumbing (we get it, you have models to train): The problem: In synchronous RL (reinforcement learning) training, data generation (model inference to create data samples) dominates wall-clock time — a single batch of 32K-token rollouts on a 32B (32-billion parameter) model can take hours, while the GPUs used for training remain idle. The solution everyone converged on: Disaggregate (separate) inference and training onto […]

March 13, 2026 huggingface

Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation

The world of data is vast, but quantitative information is often sparse or unavailable in text form online, presenting a significant challenge for deep research agents. This post shares an architecture, NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer, for building autonomous data analysis agents, developed by the NVIDIA Kaggle Grandmasters (KGMON) LLM Agent Research Team. The NVIDIA KGMON (NeMo Agent Toolkit) Data Explorer project introduces an agent specialized for dataset exploration and analysis, designed to handle the complexities of multi-step […]

March 13, 2026 huggingface

Introducing Storage Buckets on the Hugging Face Hub

Hugging Face Models and Datasets repos are great for publishing final artifacts. But production ML generates a constant stream of intermediate files (checkpoints, optimizer states, processed shards, logs, traces, etc.) that change often, arrive from many jobs at once, and rarely need version control. Storage Buckets are built exactly for this: mutable, S3-like object storage you can browse on the Hub, script from Python, or manage with the hf CLI. And because they are backed by Xet, they are especially […]

« 1 … 68 69 70