March 13, 2026 huggingface

Back to The Future: Evaluating AI Agents on Predicting Future Events

Most current AI benchmarks focus on answering questions about the past, either by testing models on existing knowledge (in a static manner, such as HLE or GPQA, or augmented, like BrowseComp or GAIA) or previously solved problems (like PaperBench, DABStep, or most coding evaluations). However, we believe that more valuable AI, and ultimately AGI, will be distinguished by its ability to use this past to forecast interesting aspects of the future, rather than merely reciting old facts. Forecasting future events […]

March 13, 2026 huggingface

Consilium: When Multiple LLMs Collaborate

Picture this: four AI experts sitting around a poker table, debating your toughest decisions in real-time. That’s exactly what Consilium, the multi-LLM platform I built during the Gradio Agents & MCP Hackathon, does. It lets AI models discuss complex questions and reach consensus through structured debate. The platform works both as a visual Gradio interface and as an MCP (Model Context Protocol) server

March 13, 2026 huggingface

Arc Virtual Cell Challenge: A Primer

Arc Institute recently unveiled the Virtual Cell Challenge. Participants are required to train a model capable of predicting the effect of silencing a gene in a (partially) unseen cell type, a task they term

March 13, 2026 huggingface

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

AI builders want a choice of the latest large language models (LLM) architectures and specialized variants for use in AI agents and other apps, but handling all the diversity can slow testing and deployment pipelines. In particular, managing and optimizing different inference software frameworks to achieve best performance across varied LLMs and serving requirements is a time-consuming bottleneck

March 13, 2026 huggingface

Fast LoRA inference for Flux with Diffusers and PEFT

LoRA adapters provide a great deal of customization for models of all shapes and sizes. When it comes to image generation, they can empower the models with different styles, different characters, and much more. Sometimes, they can also

March 13, 2026 huggingface

TimeScope: How Long Can Your Video Large Multimodal Model Go?

TimeScope is an open-source benchmark designed to measure how well vision-language models understand long videos. By adding short “needle” clips into videos ranging from 1 minute to 8 hours, it evaluates three skills: localized retrieval, information synthesis, fine-grained temporal perception. Timescope reveals that many state-of-the-art models still struggle with true temporal comprehension. Table of Contents Recent advances in multimodal AI have produced models claiming to understand hour-long videos. This trend mirrors progress in long-context language models,

March 13, 2026 huggingface

Parquet Content-Defined Chunking

Reduce Parquet file upload and download times on Hugging Face Hub by leveraging the new Xet storage layer and Apache Arrow’s Parquet Content-Defined Chunking (CDC) feature enabling more efficient and scalable data workflows. TL;DR: Parquet Content-Defined Chunking (CDC) is now available in PyArrow and Pandas, enabling efficient deduplication of Parquet files on content-addressable storage systems like Hugging Face’s Xet storage layer. CDC

March 13, 2026 huggingface

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

TL;DR: Trackio is a new, open-source, and free experiment tracking Python library that provides a local dashboard and seamless integration with Hugging Face Spaces for easy sharing and collaboration. Since trackio is a drop-in replacement for wandb, you can get started with the syntax you already know! Background If you have trained your own machine learning model, you know how important it is to be able to track metrics, parameters, and hyperparameters during training and visualize

March 13, 2026 huggingface

Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio

Python Developers, want to give your LLM superpowers? Gradio is the fastest way to do it! With Gradio’s Model Context Protocol (MCP) integration, your LLM can plug directly into the thousands of AI models and Spaces hosted on the Hugging Face Hub. By pairing the general reasoning capabilities of LLMs with the specialized abilities of models found on Hugging Face,

March 13, 2026 huggingface

📚 3LM: A Benchmark for Arabic LLMs in STEM and Code

Why 3LM? Arabic Large Language Models (LLMs) have seen notable progress in recent years, yet existing benchmarks fall short when it comes to evaluating performance in high-value technical domains. Most evaluations to date have focused on general-purpose tasks like summarization, sentiment analysis, or generic question answering. However, scientific reasoning and programming are essential for a broad range of real-world applications, from education to technical problem-solving. To address this gap, we introduce 3LM (علم),

« 1 … 58 59 60 61 62 … 1,021 »