1 Billion Classifications

You’ve optimized your model. Your pipeline is running smoothly. But now, your cloud bill has skyrocketed. Running 1B+ classifications or embeddings per day isn’t just a technical challenge—it’s a financial one. How do you process at this scale without blowing your budget? Whether you’re running large-scale document classification or bulk embedding pipelines for Retrieval-Augmented Generation (RAG), you need cost-efficient, high-throughput inference to    

Read more

Fixing Open LLM Leaderboard with Math-Verify

3 weeks ago, we showed how hard it is to correctly evaluate LLM performance on math problems, and introduced Math-Verify, a better solution to validate models on math (read more in the announcement)! Today, we’re thrilled to share that we’ve used Math-Verify to thoroughly re-evaluate all 3,751 models ever submitted to the Open LLM Leaderboard, for even fairer and more robust model comparisons! Why math evaluation on the Open LLM Leaderboard was broken The    

Read more

PaliGemma 2 Mix – New Instruction Vision Language Models by Google

Last December, Google released PaliGemma 2: a new family of pre-trained (pt) PaliGemma vision language models (VLMs) based on SigLIP and Gemma 2. The models come in three different sizes (3B, 10B, 28B) and three different resolutions (224×224, 448×448, 896×896). Today, Google is releasing PaliGemma 2 mix: fine-tuned on a mix of vision language tasks, including OCR, long and short captioning and more. PaliGemma 2 pretrained (pt) variants are great vision language models to transfer on a given task at […]

Read more

SmolVLM2: Bringing Video Understanding to Every Device

SmolVLM2 represents a fundamental shift in how we think about video understanding – moving from massive models that require substantial computing resources to efficient models that can run anywhere. Our goal is simple: make video understanding accessible across all devices and use cases, from phones to servers. We are releasing models in three sizes (2.2B, 500M and 256M), MLX ready (Python and Swift APIs) from day zero. We’ve made all models and demos available in this collection. Want to try […]

Read more

SigLIP 2: A better multilingual vision language encoder

Today Google releases a new and better family of multilingual vision-language encoders, SigLIP 2. The authors have extended the training objective of SigLIP (sigmoid loss) with additional objectives for improved semantic understanding, localization, and dense features. SigLIP 2 models outperform the older SigLIP ones at all model scales in core capabilities, including zero-shot classification, image-text retrieval, and transfer performance when extracting visual representations for Vision-Language Models (VLMs). A cherry on top is the dynamic resolution (naflex) variant. This is useful […]

Read more

HuggingFace, IISc partner to supercharge model building on India’s diverse languages

The Indian Institute of Science IISc and ARTPARK partner with Hugging Face to enable developers across the globe to access Vaani, India’s most diverse open-source, multi-modal, multi-lingual dataset. Both organisations share a commitment to building inclusive, accessible, and state-of-the-art AI technologies that honor linguistic and cultural diversity. Partnership The partnership between Hugging Face and IISc/ARTPARK aims to increase the accessibility and improve usability of the Vaani dataset, encouraging the development of AI    

Read more

Trace & Evaluate your Agent with Arize Phoenix

So, you’ve built your agent. It takes in inputs and tools, processes them, and generates responses. Maybe it’s making decisions, retrieving information, executing tasks autonomously, or all three. But now comes the big question – how effectively is it performing? And more importantly, how do you know? Building an agent is one thing; understanding its behavior is another. That’s where tracing and evaluations come in. Tracing allows you to see exactly what your agent is doing step by step—what inputs […]

Read more

A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality

With the release of the Aya Vision family, our new 8B and 32B parameter vision-language models (VLMs), we are addressing one of the biggest challenges in AI: bringing multilingual performance to multimodal models. Aya Vision is Cohere For AI‘s latest open-weight multilingual and multimodal model family, designed to be a strong foundation for language and vision understanding across 23 languages. It builds on the success of Aya Expanse, state-of-the-art multilingual language models, and extends it using a combination of advanced […]

Read more
1 50 51 52 53 54 1,021