March 13, 2026 huggingface

SigLIP 2: A better multilingual vision language encoder

Today Google releases a new and better family of multilingual vision-language encoders, SigLIP 2. The authors have extended the training objective of SigLIP (sigmoid loss) with additional objectives for improved semantic understanding, localization, and dense features. SigLIP 2 models outperform the older SigLIP ones at all model scales in core capabilities, including zero-shot classification, image-text retrieval, and transfer performance when extracting visual representations for Vision-Language Models (VLMs). A cherry on top is the dynamic resolution (naflex) variant. This is useful […]

March 13, 2026 huggingface

Remote VAEs for decoding with Inference Endpoints 🤗

(This post was authored by hlky and Sayak) When operating with latent-space diffusion models for high-resolution image and video synthesis, the VAE decoder can consume quite a bit more memory. This makes it hard for the users to run

March 13, 2026 huggingface

FastRTC: The Real-Time Communication Library for Python

In the last few months, many new real-time speech models have been released and entire companies have been founded around both open and closed source models. To name a few milestones: OpenAI and Google released their live multimodal APIs

March 13, 2026 huggingface

HuggingFace, IISc partner to supercharge model building on India’s diverse languages

The Indian Institute of Science IISc and ARTPARK partner with Hugging Face to enable developers across the globe to access Vaani, India’s most diverse open-source, multi-modal, multi-lingual dataset. Both organisations share a commitment to building inclusive, accessible, and state-of-the-art AI technologies that honor linguistic and cultural diversity. Partnership The partnership between Hugging Face and IISc/ARTPARK aims to increase the accessibility and improve usability of the Vaani dataset, encouraging the development of AI

March 13, 2026 huggingface

Trace & Evaluate your Agent with Arize Phoenix

So, you’ve built your agent. It takes in inputs and tools, processes them, and generates responses. Maybe it’s making decisions, retrieving information, executing tasks autonomously, or all three. But now comes the big question – how effectively is it performing? And more importantly, how do you know? Building an agent is one thing; understanding its behavior is another. That’s where tracing and evaluations come in. Tracing allows you to see exactly what your agent is doing step by step—what inputs […]

March 13, 2026 huggingface

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

With the release of the Aya Vision family, our new 8B and 32B parameter vision-language models (VLMs), we are addressing one of the biggest challenges in AI: bringing multilingual performance to multimodal models. Aya Vision is Cohere For AI‘s latest open-weight multilingual and multimodal model family, designed to be a strong foundation for language and vision understanding across 23 languages. It builds on the success of Aya Expanse, state-of-the-art multilingual language models, and extends it using a combination of advanced […]

March 13, 2026 huggingface

LeRobot goes to driving school

TL;DR of L2D, the world’s largest self-driving dataset! 90+ TeraBytes of multimodal data (5000+ hours of driving) from 30 cities in Germany 6x surrounding HD cameras and complete vehicle state: Speed/Heading/GPS/IMU Continuous: Gas/Brake/Steering and discrete actions: Gear/Turn Signals

March 13, 2026 huggingface

Open R1: Update #3

Over the last few weeks, we have focused our efforts on reproducing the competitive programming (code reasoning) aspects of the DeepSeek-R1 recipe. In this post, we are excited to share: The construction of CodeForces-CoTs: a dataset of nearly 100k high-quality samples distilled from R1 to produce solutions in C++ and Python. The IOI benchmark: a new benchmark of challenging problems from the 2024 International Olympiad in Informatics (IOI). OlympicCoder: two fine-tuned 7B and 32B code models that outperform closed-source frontier […]

March 13, 2026 huggingface

Welcome Gemma 3: Google’s all new multimodal, multilingual, long context open LLM

Today Google releases Gemma 3, a new iteration of their Gemma family of models. The models range from 1B to 27B parameters, have a context window up to 128k tokens, can accept images and text, and support 140+ languages. Try out Gemma 3 now 👉🏻 Gemma 3 Space Gemma 2 Gemma 3 Size Variants 2B 9B 27B 1B 4B 12B 27B Context Window Length 8k 32k (1B) 128k (4B, 12B, 27B) Multimodality (Images and Text) ❌ ❌ (1B)

March 13, 2026 huggingface

Xet is on the Hub

Want to skip the details and get straight to faster uploads and downloads with bigger files than ever before? Click here to read about joining the Xet waitlist (or head over to join immediately). Over the past few weeks, Hugging Face’s Xet Team took a major step forward by migrating the first Model and Dataset repositories off LFS and to Xet storage. This marks one of many steps to fulfill Hugging Face’s vision for the Hub by empowering AI builders […]

« 1 … 50 51 52 53 54 … 70 »