Open-source LLMs as LangChain Agents

Open-source LLMs have now reached a performance level that makes them suitable reasoning engines for powering agent workflows: Mixtral even surpasses GPT-3.5 on our benchmark, and its performance could easily be further enhanced with fine-tuning. We’ve released the simplest agentic library out there: smolagents! Go checkout the smolagents introduction blog here. Introduction Large Language Models (LLMs) trained for causal language modeling can tackle a wide range of    

Read more

An Introduction to AI Secure LLM Safety Leaderboard

Given the widespread adoption of LLMs, it is critical to understand their safety and risks in different scenarios before extensive deployments in the real world. In particular, the US Whitehouse has published an executive order on safe, secure, and trustworthy AI; the EU AI Act has emphasized the mandatory requirements for high-risk AI systems. Together with regulations, it is important to provide technical solutions to assess the risks of AI systems, enhance their safety, and potentially provide safe and aligned […]

Read more

The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

In the rapidly evolving field of Natural Language Processing (NLP), Large Language Models (LLMs) have become central to AI’s ability to understand and generate human language. However, a significant challenge that persists is their tendency to hallucinate — i.e., producing content that may not align with real-world facts or the user’s input. With the constant release of new open-source models, identifying the most reliable ones, particularly in terms of their propensity to generate hallucinated content, becomes crucial. The Hallucinations Leaderboard […]

Read more

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Recently, code generation models have become very popular, especially with the release of state-of-the-art open-source models such as BigCode’s StarCoder and Meta AI’s Code Llama. A growing number of works focuses on making Large Language Models (LLMs) more optimized and accessible. In this blog, we are happy to share the latest results of LLM optimization on Intel Xeon focusing on the popular code generation LLM, StarCoder. The StarCoder Model is a cutting-edge LLM specifically designed for assisting the user with […]

Read more

Introducing the Enterprise Scenarios Leaderboard: a Leaderboard for Real World Use Cases

Today, the Patronus team is excited to announce the new Enterprise Scenarios Leaderboard, built using the Hugging Face Leaderboard Template in collaboration with their teams. The leaderboard aims to evaluate the performance of language models on real-world enterprise use cases. We currently support 6 diverse tasks – FinanceBench, Legal Confidentiality, Creative Writing, Customer Support Dialogue, Toxicity, and Enterprise PII. We measure the performance of models on metrics like accuracy, engagingness, toxicity, relevance, and Enterprise PII. Why do    

Read more

Patch Time Series Transformer in Hugging Face – Getting Started

In this blog, we provide examples of how to get started with PatchTST. We first demonstrate the forecasting capability of PatchTST on the Electricity data. We will then demonstrate the transfer learning capability of PatchTST by using the previously trained model to do zero-shot forecasting on the electrical transformer (ETTh1) dataset. The zero-shot forecasting performance will denote the test performance of the model in the target domain, without any training on the target domain. Subsequently, we will do linear probing […]

Read more

Constitutional AI with Open LLMs

Since the launch of ChatGPT in 2022, we have seen tremendous progress in LLMs, ranging from the release of powerful pretrained models like Llama 2 and Mixtral, to the development of new alignment techniques like Direct Preference Optimization. However, deploying LLMs in consumer applications poses several challenges, including the need to add guardrails that prevent the model from generating undesirable responses. For example, if you are building an AI tutor for children, then you don’t want it to generate toxic […]

Read more

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

We’re happy to introduce the NPHardEval leaderboard, using NPHardEval, a cutting-edge benchmark developed by researchers from the University of Michigan and Rutgers University. NPHardEval introduces a dynamic, complexity-based framework for assessing Large Language Models’ (LLMs) reasoning abilities. It poses 900 algorithmic questions spanning the NP-Hard complexity class and lower, designed to rigorously test LLMs, and is updated on a monthly basis to prevent overfitting! A Unique Approach to LLM Evaluation NPHardEval stands apart    

Read more
1 31 32 33 34 35 1,021