Welcome Llama 3 – Meta’s new open LLM

Meta’s Llama 3, the next iteration of the open-access Llama family, is now released and available at Hugging Face. It’s great to see Meta continuing its commitment to open AI, and we’re excited to fully support the launch with comprehensive integration in the Hugging Face ecosystem. Llama 3 comes in two sizes: 8B for efficient deployment and development on consumer-size GPU, and 70B for large-scale AI native applications. Both come in base and instruction-tuned variants. In addition to the 4 […]

Read more

The Open Medical-LLM Leaderboard: Benchmarking Large Language Models in Healthcare

Over the years, Large Language Models (LLMs) have emerged as a groundbreaking technology with immense potential to revolutionize various aspects of healthcare. These models, such as GPT-3, GPT-4 and Med-PaLM 2 have demonstrated remarkable capabilities in understanding and generating human-like text, making them valuable tools for tackling complex medical tasks and improving patient care. They have notably shown promise in various medical applications, such as medical question-answering (QA), dialogue systems, and text generation. Moreover, with the exponential growth of electronic […]

Read more

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

We’re excited to share Jack of All Trades (JAT), a project that aims to move in the direction of a generalist agent. The project started as an open reproduction of the Gato (Reed et al., 2022) work, which proposed to train a Transformer able to perform both vision-and-language and decision-making tasks. We thus started by building an open version of Gato’s dataset. We then trained multi-modal Transformer models on it, introducing several improvements over Gato for handling sequential data and […]

Read more

Introducing the Open Chain of Thought Leaderboard

Chain-of-thought prompting is emerging as a powerful and effective design pattern for LLM-based apps and agents. The basic idea of chain-of-thought prompting is to let a model generate a step-by-step solution (“reasoning trace”) before answering a question or taking a decision. With the Open CoT Leaderboard we’re tracking LLMs’ ability to generate effective chain-of-thought traces for challenging reasoning tasks. Unlike most performance based leaderboards, we’re not scoring the absolute accuracy a model achieves on a given task, but the difference […]

Read more

StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation

Instruction tuning is an approach of fine-tuning that gives large language models (LLMs) the capability to follow natural and human-written instructions. However, for programming tasks, most models are tuned on either human-written instructions (which are very expensive) or instructions generated by huge and proprietary LLMs (which may not be permitted). We introduce StarCoder2-15B-Instruct-v0.1, the very first entirely self-aligned code LLM trained with a fully permissive and transparent pipeline. Our open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which […]

Read more

Improving Prompt Consistency with Structured Generations

Recently, the Leaderboards and Evals research team at Hugging Face did small experiments, which highlighted how fickle evaluation can be. For a given task, results are extremely sensitive to minuscule changes in prompt format! However, this is not what we want: a model prompted with the same amount of information as input should output similar results. We discussed this with our friends at Dottxt, who had an idea – what if there was a way to increase consistency across prompt […]

Read more

Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints

Whisper is one of the best open source speech recognition models and definitely the one most widely used. Hugging Face Inference Endpoints make it very easy to deploy any Whisper model out of the box. However, if you’d like to introduce additional features, like a diarization pipeline to identify speakers, or assisted generation for speculative decoding, things get trickier. The reason is that you need to combine Whisper with additional models, while still exposing a single API endpoint. We’ll solve […]

Read more

Bringing the Artificial Analysis LLM Performance Leaderboard to Hugging Face

Building applications with LLMs requires considering more than just quality: for many use-cases, speed and price are equally or more important. For consumer applications and chat experiences, speed and responsiveness are critical to user engagement. Users expect near-instant responses, and delays can directly lead to reduced engagement. When building more complex applications involving tool use or agentic systems, speed and cost become even more important, and can become the limiting factor on overall system capability. The time taken by sequential […]

Read more

Introducing the Open Leaderboard for Hebrew LLMs!

This project addresses the critical need for advancement in Hebrew NLP. As Hebrew is considered a low-resource language, existing LLM leaderboards often lack benchmarks that accurately reflect its unique characteristics. Today, we are excited to introduce a pioneering effort to change this narrative — our new open LLM leaderboard, specifically designed to evaluate and enhance language models in Hebrew. Hebrew is a morphologically rich language with a complex system of roots and patterns. Words are built from roots with prefixes, […]

Read more

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

Retrieval-augmented generation (RAG) enhances text generation with a large language model by incorporating fresh domain knowledge stored in an external datastore. Separating your company data from the knowledge learned by language models during training is essential to balance performance, accuracy, and security privacy goals. In this blog, you will learn how Intel can help you develop and deploy RAG applications as part of OPEA, the Open Platform for Enterprise AI. You will also discover how Intel Gaudi 2 AI accelerators […]

Read more
1 36 37 38 39 40 1,021