March 13, 2026 huggingface

Build awesome datasets for video generation

(This post was authored by hlky and Sayak) Tooling for image generation datasets is well established, with img2dataset being a fundamental tool used for large scale dataset preparation, and complemented with various community guides, scripts and UIs that cover smaller scale initiatives. Our ambition is to make tooling for video generation datasets equally established, by creating open video

March 13, 2026 huggingface

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

Content-defined chunking (CDC) plays a central role in enabling deduplication within a Xet-backed repository. The idea is straightforward: break each file’s data into chunks, store only unique ones, reap the benefits. In practice, it’s more complex. If we focused solely on maximizing deduplication, the design would call for the smallest possible chunk size. By doing that, we’d create significant overheads for the infrastructure and the builders on the Hub. On Hugging Face’s Xet team, we’re bringing CDC from theory to […]

March 13, 2026 huggingface

1 Billion Classifications

You’ve optimized your model. Your pipeline is running smoothly. But now, your cloud bill has skyrocketed. Running 1B+ classifications or embeddings per day isn’t just a technical challenge—it’s a financial one. How do you process at this scale without blowing your budget? Whether you’re running large-scale document classification or bulk embedding pipelines for Retrieval-Augmented Generation (RAG), you need cost-efficient, high-throughput inference to

March 13, 2026 huggingface

Fixing Open LLM Leaderboard with Math-Verify

3 weeks ago, we showed how hard it is to correctly evaluate LLM performance on math problems, and introduced Math-Verify, a better solution to validate models on math (read more in the announcement)! Today, we’re thrilled to share that we’ve used Math-Verify to thoroughly re-evaluate all 3,751 models ever submitted to the Open LLM Leaderboard, for even fairer and more robust model comparisons! Why math evaluation on the Open LLM Leaderboard was broken The

March 13, 2026 huggingface

PaliGemma 2 Mix – New Instruction Vision Language Models by Google

Last December, Google released PaliGemma 2: a new family of pre-trained (pt) PaliGemma vision language models (VLMs) based on SigLIP and Gemma 2. The models come in three different sizes (3B, 10B, 28B) and three different resolutions (224×224, 448×448, 896×896). Today, Google is releasing PaliGemma 2 mix: fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more. PaliGemma 2 pretrained (pt) variants are great vision language models to transfer on a given task at […]

March 13, 2026 huggingface

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a fundamental shift in how we think about video understanding – moving from massive models that require substantial computing resources to efficient models that can run anywhere. Our goal is simple: make video understanding accessible across all devices and use cases, from phones to servers. We are releasing models in three sizes (2.2B, 500M and 256M), MLX ready (Python and Swift APIs) from day zero. We’ve made all models and demos available in this collection. Want to try […]

March 13, 2026 huggingface

SigLIP 2: A better multilingual vision language encoder

Today Google releases a new and better family of multilingual vision-language encoders, SigLIP 2. The authors have extended the training objective of SigLIP (sigmoid loss) with additional objectives for improved semantic understanding, localization, and dense features. SigLIP 2 models outperform the older SigLIP ones at all model scales in core capabilities, including zero-shot classification, image-text retrieval, and transfer performance when extracting visual representations for Vision-Language Models (VLMs). A cherry on top is the dynamic resolution (naflex) variant. This is useful […]

March 13, 2026 huggingface

Remote VAEs for decoding with Inference Endpoints 🤗

(This post was authored by hlky and Sayak) When operating with latent-space diffusion models for high-resolution image and video synthesis, the VAE decoder can consume quite a bit more memory. This makes it hard for the users to run

March 13, 2026 huggingface

FastRTC: The Real-Time Communication Library for Python

In the last few months, many new real-time speech models have been released and entire companies have been founded around both open and closed source models. To name a few milestones: OpenAI and Google released their live multimodal APIs

March 13, 2026 huggingface

HuggingFace, IISc partner to supercharge model building on India’s diverse languages

The Indian Institute of Science IISc and ARTPARK partner with Hugging Face to enable developers across the globe to access Vaani, India’s most diverse open-source, multi-modal, multi-lingual dataset. Both organisations share a commitment to building inclusive, accessible, and state-of-the-art AI technologies that honor linguistic and cultural diversity. Partnership The partnership between Hugging Face and IISc/ARTPARK aims to increase the accessibility and improve usability of the Vaani dataset, encouraging the development of AI

« 1 … 50 51 52 53 54 … 1,021 »