March 13, 2026 huggingface

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖

Simon Alibert and Rémi Cadène from the LeRobot team with Reachy 1 — Photo: Léa Crespi Since Hugging Face started the LeRobot library in 2024, led by ex-Tesla lead Remi Cadene, the Hugging Face Hub has quickly become the most widely used hub and software platform for open robotics with models, datasets, spaces and libraries. Today, we’re excited to take it a step further by welcoming Pollen Robotics to Hugging Face, a team that’s spent the last 9 years building […]

March 13, 2026 huggingface

Introducing HELMET: Holistically Evaluating Long-context Language Models

Contact: hyen@cs.princeton.edu Paper: https://arxiv.org/abs/2410.02694 Website: https://princeton-nlp.github.io/HELMET Code & Data: https://github.com/princeton-nlp/HELMET Since we first released HELMET last October, there has been more development on long-context language models than ever before, and we are thrilled to see the adoption of HELMET by the community, such as Microsoft’s Phi-4 and AI21’s Jamba 1.6. After the initial release, we have added more models to our evaluation suite and conducted additional analyses. We are excited to share our new results and present HELMET at ICLR […]

March 13, 2026 huggingface

Cohere on Hugging Face Inference Providers 🔥

We’re thrilled to share that Cohere is now a supported Inference Provider on HF Hub! This also marks the first model creator to share and serve their models directly on the Hub. Cohere is committed to building and serving models purpose-built for enterprise use-cases. Their comprehensive suite of secure AI solutions, from cutting-edge Generative AI to powerful Embeddings and Ranking models, are designed to tackle real-world business challenges. Additionally, Cohere Labs, Cohere’s in house research lab, supports fundamental research and […]

March 13, 2026 huggingface

17 Reasons Why Gradio Isn’t Just Another UI Library

“Oh, Gradio? That’s a Python library for building UIs, right?” We hear this a lot, and while Gradio does let you create interactive UIs with minimal Python code, calling Gradio a “UI library” misses the bigger picture! Gradio is

March 13, 2026 huggingface

Prefill and Decode for Concurrent Requests – Optimizing LLM Performance

Handling load from multiple users in parallel is crucial for the performance of LLM applications. In the previous part of our series on LLM performance, we discussed queueing strategies for the prioritization of different users. In this second part, we will now focus on the concurrent processing of requests, and how it impacts relevant metrics such as latency

March 13, 2026 huggingface

Finetuning olmOCR to be a faithful OCR-Engine

At TNG, we created a fine-tune of an Optical Character Recognition model based on olmOCR to help us automate our internal document processing workflows. Recently, the Allen Institute for Artificial Intelligence

March 13, 2026 huggingface

Tiny Agents: an MCP-powered agent in 50 lines of code

New! (May 23, ’25) If you prefer Python, check out the companion post Tiny Agents in Python. Over the past few weeks, I’ve been diving into MCP (Model Context Protocol) to understand what the hype around it was all about. My TL;DR is that it’s fairly simple, but still quite powerful: MCP is a standard API

March 13, 2026 huggingface

PipelineRL

We are excited to open-source PipelineRL, an experimental RL implementation that tackles a fundamental challenge in large-scale Reinforcement Learning with LLMs: the trade-off between inference throughput and on-policy data collection. PipelineRL’s key innovation is inflight weight updates during RL training (see Figure 1 below). This allows PipelineRL to achieve constantly high inference throughput and minimize the lag between the weights used for rollouts and the most recently updated model weights. The result: fast and stable RL training for large language […]

March 13, 2026 huggingface

What is AutoRound?

As large language models (LLMs) and vision-language models (VLMs) continue to grow in size and complexity, deploying them efficiently becomes increasingly challenging. Quantization offers a solution by reducing model size and inference latency. Intel’s AutoRound emerges as a cutting-edge quantization tool that balances accuracy, efficiency, and compatibility. AutoRound is a weight-only post-training quantization (PTQ) method developed by Intel. It uses signed gradient descent to jointly optimize weight rounding and clipping ranges, enabling accurate low-bit quantization (e.g., INT2 – INT8) with […]

March 13, 2026 huggingface

The 4 Things Qwen-3’s Chat Template Teaches Us

What a boring Jinja snippet tells us about the new Qwen-3 model. The new Qwen-3 model by Qwen ships with a much more sophisticated chat template than its predecessors Qwen-2.5 and QwQ. By taking a look at the differences in the Jinja template, we can find interesting insights into the new model. Chat Templates

« 1 … 53 54 55 56 57 … 1,021 »