Building the Hugging Face MCP Server

TL;DR: The Hugging Face Official MCP Server offers unique customization options for AI Assistants accessing the Hub, along with access to thousands of AI applications through one simple URL. We used MCPs “Streamable HTTP” transport for deployment, and examine in detail the trade-offs that Server Developers have. We’ve learned many things about building a useful MCP server in the last month – we’ll describe our journey here. Introduction The Model Context Protocol (MCP) is    

Read more

Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Numina & Kimi Team Figure 1: Performance comparison of theorem proving models on the miniF2F-test dataset. We’re excited to announce the release of Kimina-Prover-72B, our state-of-the-art theorem proving model trained with the Kimi k1.5[1] RL pipeline based on Qwen2.5-72B [2]. Alongside it, we are also releasing two distilled variants: Kimina-Prover-Distill-8B and 1.7B (based on Qwen3-8B and Qwen3-1.7B[3] respectively). Our key innovations include: Test-Time Reinforcement Learning Search: A trainable agentic proving framework that enables the model to recursively discover, combine and […]

Read more

Migrating the Hub from Git LFS to Xet

In January of this year, Hugging Face’s Xet Team deployed a new storage backend, and shortly after shifted ~6% of Hub downloads through the infrastructure. This represented a significant milestone, but it was just the beginning. In 6 months, 500,000 repositories holding 20 PB joined the move to Xet as the Hub outgrows Git LFS and transitions to a storage system that scales with the workloads of AI builders. Today, more than 1 million people on the Hub are using […]

Read more

Ettin Suite: SoTA Paired Encoders and Decoders

What would happen if you took the ModernBERT recipe and applied it to a decoder-only model? Turns out, a state-of-the-art decoder language model that beats Llama 3.2 1B and SmolLM2! We introduce a new open-data training recipe to reproduce the encoder-only ModernBERT model (and actually beat it!). We then apply the exact same recipe to decoder-only models. For the first time, we have two state-of-the-art models trained in the same setup but with two different training objectives: masked language modeling […]

Read more

Five Big Improvements to Gradio MCP Servers

Gradio is an open-source Python package for creating AI-powered web applications. Gradio is compliant with the MCP server protocol and powers thousands of MCP servers hosted on Hugging Face Spaces. The Gradio team is betting big on Gradio and Spaces being the best way to build and host AI-powered MCP servers. To that end, here are    

Read more

Back to The Future: Evaluating AI Agents on Predicting Future Events

Most current AI benchmarks focus on answering questions about the past, either by testing models on existing knowledge (in a static manner, such as HLE or GPQA, or augmented, like BrowseComp or GAIA) or previously solved problems (like PaperBench, DABStep, or most coding evaluations). However, we believe that more valuable AI, and ultimately AGI, will be distinguished by its ability to use this past to forecast interesting aspects of the future, rather than merely reciting old facts. Forecasting future events […]

Read more

Consilium: When Multiple LLMs Collaborate

Picture this: four AI experts sitting around a poker table, debating your toughest decisions in real-time. That’s exactly what Consilium, the multi-LLM platform I built during the Gradio Agents & MCP Hackathon, does. It lets AI models discuss complex questions and reach consensus through structured debate. The platform works both as a visual Gradio interface and as an MCP (Model Context Protocol) server    

Read more

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

AI builders want a choice of the latest large language models (LLM) architectures and specialized variants for use in AI agents and other apps, but handling all the diversity can slow testing and deployment pipelines. In particular, managing and optimizing different inference software frameworks to achieve best performance across varied LLMs and serving requirements is a time-consuming bottleneck    

Read more
1 57 58 59 60 61 70