An Introduction to AI Secure LLM Safety Leaderboard

Given the widespread adoption of LLMs, it is critical to understand their safety and risks in different scenarios before extensive deployments in the real world. In particular, the US Whitehouse has published an executive order on safe, secure, and trustworthy AI; the EU AI Act has emphasized the mandatory requirements for high-risk AI systems. Together with regulations, it is important to provide technical solutions to assess the risks of AI systems, enhance their safety, and potentially provide safe and aligned […]

Read more

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

In the rapidly evolving field of Natural Language Processing (NLP), Large Language Models (LLMs) have become central to AI’s ability to understand and generate human language. However, a significant challenge that persists is their tendency to hallucinate — i.e., producing content that may not align with real-world facts or the user’s input. With the constant release of new open-source models, identifying the most reliable ones, particularly in terms of their propensity to generate hallucinated content, becomes crucial. The Hallucinations Leaderboard […]

Read more

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Recently, code generation models have become very popular, especially with the release of state-of-the-art open-source models such as BigCode’s StarCoder and Meta AI’s Code Llama. A growing number of works focuses on making Large Language Models (LLMs) more optimized and accessible. In this blog, we are happy to share the latest results of LLM optimization on Intel Xeon focusing on the popular code generation LLM, StarCoder. The StarCoder Model is a cutting-edge LLM specifically designed for assisting the user with […]

Read more

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Today, the Patronus team is excited to announce the new Enterprise Scenarios Leaderboard, built using the Hugging Face Leaderboard Template in collaboration with their teams. The leaderboard aims to evaluate the performance of language models on real-world enterprise use cases. We currently support 6 diverse tasks – FinanceBench, Legal Confidentiality, Creative Writing, Customer Support Dialogue, Toxicity, and Enterprise PII. We measure the performance of models on metrics like accuracy, engagingness, toxicity, relevance, and Enterprise PII. Why do    

Read more

Patch Time Series Transformer in Hugging Face – Getting Started

In this blog, we provide examples of how to get started with PatchTST. We first demonstrate the forecasting capability of PatchTST on the Electricity data. We will then demonstrate the transfer learning capability of PatchTST by using the previously trained model to do zero-shot forecasting on the electrical transformer (ETTh1) dataset. The zero-shot forecasting performance will denote the test performance of the model in the target domain, without any training on the target domain. Subsequently, we will do linear probing […]

Read more

Constitutional AI with Open LLMs

Since the launch of ChatGPT in 2022, we have seen tremendous progress in LLMs, ranging from the release of powerful pretrained models like Llama 2 and Mixtral, to the development of new alignment techniques like Direct Preference Optimization. However, deploying LLMs in consumer applications poses several challenges, including the need to add guardrails that prevent the model from generating undesirable responses. For example, if you are building an AI tutor for children, then you don’t want it to generate toxic […]

Read more

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

We’re happy to introduce the NPHardEval leaderboard, using NPHardEval, a cutting-edge benchmark developed by researchers from the University of Michigan and Rutgers University. NPHardEval introduces a dynamic, complexity-based framework for assessing Large Language Models’ (LLMs) reasoning abilities. It poses 900 algorithmic questions spanning the NP-Hard complexity class and lower, designed to rigorously test LLMs, and is updated on a monthly basis to prevent overfitting! A Unique Approach to LLM Evaluation NPHardEval stands apart    

Read more

SegMoE: Segmind Mixture of Diffusion Experts

SegMoE is an exciting framework for creating Mixture-of-Experts Diffusion models from scratch! SegMoE is comprehensively integrated within the Hugging Face ecosystem and comes supported with diffusers 🔥! Among the features and integrations being released today: Table of Contents What is SegMoE? SegMoE models follow the same architecture as Stable Diffusion. Like Mixtral 8x7b, a SegMoE model    

Read more

From OpenAI to Open LLMs with Messages API on Hugging Face

We are excited to introduce the Messages API to provide OpenAI compatibility with Text Generation Inference (TGI) and Inference Endpoints. Starting with version 1.4.0, TGI offers an API compatible with the OpenAI Chat Completion API. The new Messages API allows customers and users to transition seamlessly from OpenAI models to open LLMs. The API can be directly used with OpenAI’s client libraries or third-party tools, like LangChain or LlamaIndex. “The new Messages API with OpenAI compatibility makes it easy   […]

Read more
1 31 32 33 34 35 70