Expert Support case study: Bolstering a RAG app with LLM-as-a-Judge

This is a guest blog post authored by Digital Green. Digital green is participating in a CGIAR-led collaboration to bring agricultural support to smallholder farmers. There are an estimated 500 million smallholder farmers globally: they play a critical role in global food security. Timely access to accurate information is essential for these farmers to make informed decisions and improve their yields. An “agricultural extension service” offers technical advice on agriculture to farmers, and also supplies them with the necessary inputs […]

Read more

Universal Assisted Generation: Faster Decoding with Any Assistant Model

TL;DR: Many LLMs such as gemma-2-9b and Mixtral-8x22B-Instruct-v0.1 lack a much smaller version to use for assisted generation. In this blog post, we present Universal Assisted Generation: a method developed by Intel Labs and Hugging Face which extends assisted generation to work with a small language model from any model family 🤯. As a result, it is now possible to accelerate inference from any decoder or Mixture of Experts model by 1.5x-2.0x with almost zero overhead 🔥🔥🔥. Let’s dive in! […]

Read more

Argilla 2.4: Easily Build Fine-Tuning and Evaluation Datasets on the Hub — No Code Required

We are incredibly excited to share the most impactful feature since Argilla joined Hugging Face: you can prepare your AI datasets without any code, getting started from any Hub dataset! Using Argilla’s UI, you can easily import a dataset from the Hugging Face Hub, define questions, and start collecting human feedback. Not familiar with Argilla? Argilla is a free, open-source data-centric tool. Using Argilla, AI developers and domain experts can collaborate and build high-quality datasets. Argilla is part of the […]

Read more

Hugging Face + PyCharm

It’s a Tuesday morning. As a Transformers maintainer, I’m doing the same thing I do most weekday mornings: Opening PyCharm, loading up the Transformers codebase and gazing lovingly at the chat template documentation while ignoring the 50 user issues I was pinged on that day. But this time, something feels different: Something is…    

Read more

Judge Arena: Benchmarking LLMs as Evaluators

LLM-as-a-Judge has emerged as a popular way to grade natural language outputs from LLM applications, but how do we know which models make the best judges? We’re excited to launch Judge Arena – a platform that lets anyone easily compare models as judges side-by-side. Just run the judges on a test sample and vote which judge you agree with most. The results will be organized into a leaderboard that displays the best judges. Judge Arena Crowdsourced, randomized    

Read more

Introduction to the Open Leaderboard for Japanese LLMs

LLMs are now increasingly capable in English, but it’s quite hard to know how well they perform in other national languages, widely spoken but which present their own set of linguistic challenges. Today, we are excited to fill this gap for Japanese! We’d like to announce the Open Japanese LLM Leaderboard, composed of more than 20 datasets from classical to modern NLP tasks to understand underlying mechanisms of Japanese LLMs. The Open Japanese LLM Leaderboard was built by the LLM-jp, […]

Read more

Faster Text Generation with Self-Speculative Decoding

Self-speculative decoding, proposed in LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding is a novel approach to text generation. It combines the strengths of speculative decoding with early exiting from a large language model (LLM). This method allows for efficient generation by using the same model’s early layers for drafting tokens, and later layers for verification. This technique not only speeds up text generation, but it also achieves significant memory savings and reduces computational latency. In order to obtain an […]

Read more

Letting Large Models Debate: The First Multilingual LLM Debate Competition

Current static evaluations and user-driven arenas have exhibited their limitations and biases in the previous year. Here, we explore a novel way to evaluate LLMs: debate. Debate is an excellent way to showcase reasoning strength and language abilities, used all across history, from the debates in the Athenian Ecclesia in the 5th century BCE to today’s World Universities Debating Championship. Do today’s large language models exhibit debate skills similar to humans? Which model is currently the best at debating? What […]

Read more

Rearchitecting Hugging Face Uploads and Downloads

As part of Hugging Face’s Xet team’s work to improve Hugging Face Hub’s storage backend, we analyzed a 24 hour window of Hugging Face upload requests to better understand access patterns. On October 11th, 2024, we saw: Uploads from 88 countries 8.2 million upload requests 130.8 TB of data transferred The map below visualizes this activity, with countries colored by bytes uploaded per hour. Currently, uploads are stored in an S3 bucket in us-east-1 and optimized using S3 Transfer Acceleration. […]

Read more
1 45 46 47 48 49 1,021