March 13, 2026 huggingface

Welcome to the Falcon 3 Family of Open Models!

We introduce Falcon3, a family of decoder-only large language models under 10 billion parameters, developed by Technology Innovation Institute (TII) in Abu Dhabi. By pushing the boundaries of performance and training efficiency, this release reflects our ongoing commitment to advancing open and accessible large foundation models. Falcon3 represents a natural evolution from previous releases,

March 13, 2026 huggingface

Bamba: Inference-Efficient Hybrid Mamba2 Model 🐍

We introduce Bamba-9B, an inference-efficient Hybrid Mamba2 model trained by IBM, Princeton, CMU, and UIUC on completely open data. At inference time, the model demonstrates 2.5x throughput improvement and 2x latency speedup compared to standard transformers in vLLM. To foster community experimentation, the model is immediately available to use in transformers, vLLM, TRL, and llama.cpp. We also release tuning, training, and extended pretraining recipes with a stateful data loader, and invite the community to further improve this model. Let’s overcome […]

March 13, 2026 huggingface

Finally, a Replacement for BERT

This blog post introduces ModernBERT, a family of state-of-the-art encoder-only models representing improvements over older generation encoders across the board, with a 8192 sequence length, better downstream performance and much faster processing. ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (149M params) and large (395M params) model size. Click to see how to use these models with transformers ModernBERT will be included in v4.48.0 of transformers. Until then, it requires installing transformers from […]

March 13, 2026 huggingface

Evaluating Audio Reasoning with Big Bench Audio

The emergence of native Speech to Speech models offers exciting opportunities to increase voice agent capabilities and simplify speech-enabled workflows. However, it’s crucial to evaluate whether this simplification comes at the cost of model performance or introduces

March 13, 2026 huggingface

Controlling Language Model Generation with NVIDIA’s LogitsProcessorZoo

Generating text with language models often involves selecting the next token based on a distribution of probabilities. A straightforward approach like greedy search selects the most probable token, but this can result in generic or repetitive outputs.

March 13, 2026 huggingface

Visualize and understand GPU memory in PyTorch

You must be familiar with this message 🤬: RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.93 GiB total capacity; 6.00 GiB already allocated; 14.88 MiB free; 6.00 GiB reserved in total by PyTorch) While it’s easy to see that GPU memory is full, understanding why and how to fix it can be more challenging. In

March 13, 2026 huggingface

Introducing smolagents, a simple library to build agents

Today we are launching smolagents, a very simple library that unlocks agentic capabilities for language models. Here’s a glimpse: from smolagents import CodeAgent, DuckDuckGoSearchTool, HfApiModel agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=HfApiModel()) agent.run(“How many seconds would it take for a leopard at full speed to run through Pont des Arts?”) Table of Contents

March 13, 2026 huggingface

CO₂ Emissions and Models Performance: Insights from the Open LLM Leaderboard

Since June 2024, we have evaluated more than 3,000 models on the Open LLM Leaderboard, a worldwide ranking of open language models performance. Even though we’re trying to run evaluations without wasting resources (we use the spare cycles of our cluster, in other words the GPUs which are active but waiting between jobs), this still represents quite a big amount of energy spent for model inference! In the last year, people have become more and more aware that using large […]

March 13, 2026 huggingface

Visual Document Retrieval Goes Multilingual

TL;DR: We present vdr-2b-multi-v1, the best multilingual embedding model for visual document retrieval. We also release its English-only twin vdr-2b-v1 and open-source the new vdr-multilingual-train dataset. With 500k high-quality samples, it’s the largest open-source multilingual

March 13, 2026 huggingface

AI Agents Are Here. What Now?

Introduction The sudden, rapid advancement of LLM capabilities – such as writing fluent sentences and achieving increasingly high scores on benchmarks – has led AI developers and businesses alike to look towards what comes next: What game-changing technology is just on the horizon? One technology very recently taking off is “AI agents”, systems that can take actions in the digital world aligned with a deployer’s goals. Most of today’s AI agents

« 1 … 47 48 49 50 51 … 1,021 »