Introduction to ggml

ggml is a machine learning (ML) library written in C and C++ with a focus on Transformer inference. The project is open-source and is being actively developed by a growing community. ggml is similar to ML libraries such as PyTorch and TensorFlow, though it is still in its early stages of development and some of its fundamentals are still changing rapidly. Over time, ggml has gained popularity alongside other projects like llama.cpp and whisper.cpp. Many other projects also use ggml […]

Read more

A failed experiment: Infini-Attention, and why we should keep trying?

TLDR: Infini-attention’s performance gets worse as we increase the number of times we compress the memory, and to the best of our knowledge, ring attention, YaRN and rope scaling are still the best ways for extending a pretrained model to longer context length. Section 0: Introduction The context length of language models is one of the central attributes besides the model’s performance. Since the emergence of in-context learning, adding relevant information to    

Read more

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Some of its key features include: a large context length of 128K tokens (vs original 8K), multilingual capabilities, tool usage capabilities, and a more permissive license. In this blog […]

Read more

The 5 Most Under-Rated Tools on Hugging Face

tl;dr The Hugging Face Hub has a number of tools and integrations that are often overlooked that can make it easier to build many types of AI solutions The Hugging Face Hub boasts over 850K public models, with ~50k new ones added every month, and that just seems to be climbing higher and higher. We also offer an Enterprise Hub subscription    

Read more

Scaling robotics datasets with video encoding

Over the past few years, text and image-based models have seen dramatic performance improvements, primarily due to scaling up model weights and dataset sizes. While the internet provides an extensive database of text and images for LLMs and image generation models, robotics lacks such a vast and diverse qualitative data source and efficient data formats. Despite efforts like Open X, we are still far from achieving the scale and diversity seen with Large Language Models. Additionally, we lack the necessary […]

Read more

Hugging Face partners with TruffleHog to Scan for Secrets

We’re excited to announce our partnership and integration with Truffle Security, bringing TruffleHog’s powerful secret scanning features to our platform as part of our ongoing commitment to security. TruffleHog is an open-source tool that detects and verifies secret leaks in code. With a wide range of detectors for popular SaaS and cloud providers, it scans files and repositories for    

Read more

Accelerate 1.0.0

3.5 years ago, Accelerate was a simple framework aimed at making training on multi-GPU and TPU systems easier by having a low-level abstraction that simplified a raw PyTorch training loop: Since then, Accelerate has expanded into a multi-faceted library aimed at tackling many common problems with large-scale training and large models in an age where 405 billion parameters (Llama) are the new language model size. This involves: A flexible low-level training API, allowing for training on six different hardware accelerators […]

Read more

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

As Large Language Models (LLMs) grow in size and complexity, finding ways to reduce their computational and energy costs has become a critical challenge. One popular solution is quantization, where the precision of parameters is reduced from the standard 16-bit floating-point (FP16) or 32-bit floating-point (FP32) to lower-bit formats like 8-bit or 4-bit. While this approach significantly cuts down on memory usage and speeds up computation, it often comes at the expense of accuracy. Reducing the precision too much can […]

Read more

Optimize and deploy models with Optimum-Intel and OpenVINO GenAI

Deploying Transformers models at the edge or client-side requires careful consideration of performance and compatibility. Python, though powerful, is not always ideal for such deployments, especially in environments dominated by C++. This blog will guide you through optimizing and deploying Hugging Face Transformers models using Optimum-Intel and OpenVINO™ GenAI, ensuring efficient AI inference with minimal dependencies. Table of Contents Why Use OpenVINO™ for Edge Deployment Step 1: Setting Up the Environment Step 2: Exporting Models to OpenVINO IR    

Read more

Exploring the Daily Papers Page on Hugging Face

In the fast-paced world of research, staying up-to-date with the latest advancements is crucial. To help developers and researchers keep a pulse on the cutting-edge of AI, Hugging Face introduced the Daily Papers page. Since its launch, Daily Papers has featured high-quality research selected by AK and researchers from the community. Over the past year, more than 3,700 papers have    

Read more
1 42 43 44 45 46 70