March 13, 2026 huggingface

GaLore: Advancing Large Model Training on Consumer-grade Hardware

The integration of GaLore into the training of large language models (LLMs) marks a significant advancement in the field of deep learning, particularly in terms of memory efficiency and the democratization of AI research. By allowing for the training of billion-parameter models on consumer-grade hardware, reducing memory footprint in optimizer states, and leveraging advanced projection matrix techniques, GaLore opens new horizons for researchers and practitioners with limited access to high-end computational resources. Scaling LLMs with Consumer-Grade Hardware The

March 13, 2026 huggingface

Cosmopedia: how to create large-scale synthetic data for pre-training

In this blog post, we outline the challenges and solutions involved in generating a synthetic dataset with billions of tokens to replicate Phi-1.5, leading to the creation of Cosmopedia. Synthetic data has become a central topic in Machine Learning. It refers to artificially generated data, for instance by large language models (LLMs), to mimic real-world data. Traditionally, creating datasets for supervised fine-tuning and instruction-tuning required the costly and time-consuming process of hiring human annotators. This practice entailed significant resources, limiting […]

March 13, 2026 huggingface

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

Because of their impressive abilities, large language models (LLMs) require significant computing power, which is seldom available on personal computers. Consequently, we have no choice but to deploy them on powerful bespoke AI servers hosted on-premises or in the cloud. Why local LLM inference is desirable What if we could run state-of-the-art open-source LLMs on a typical personal computer? Wouldn’t we enjoy benefits like: Increased privacy: our data would

March 13, 2026 huggingface

Introducing the Chatbot Guardrails Arena

With the recent advancements in augmented LLM capabilities, deployment of enterprise AI assistants (such as chatbots and agents) with access to internal databases is likely to increase; this trend could help with many tasks, from internal document summarization to personalized customer and employee support. However, data privacy of said databases can be a serious concern (see 1, 2 and 3) when deploying these models in production. So far, guardrails have emerged as the widely accepted technique to ensure the quality, […]

March 13, 2026 huggingface

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

We introduce the concept of embedding quantization and showcase their impact on retrieval speed, memory usage, disk space, and cost. We’ll discuss how embeddings can be quantized in theory and in practice, after which we introduce a demo showing a real-life retrieval scenario of 41 million Wikipedia texts. Table of Contents Why Embeddings? Embeddings are one

March 13, 2026 huggingface

Total noob’s intro to Hugging Face Transformers

Welcome to “A Total Noob’s Introduction to Hugging Face Transformers,” a guide designed specifically for those looking to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and how it works, not to turn you into a machine learning practitioner, but to enable better understanding of and collaboration with those who are. That

March 13, 2026 huggingface

Pollen-Vision: Unified interface for Zero-Shot vision models in robotics

This is a guest blog post by the Pollen Robotics team. We are the creators of Reachy, an open-source humanoid robot designed for manipulation in the real world. In the context of autonomous behaviors, the essence of a robot’s usability lies in its ability to understand and interact with its environment. This understanding primarily comes from visual perception, which enables robots to identify objects, recognize people, navigate spaces, and much more. We’re excited to share the initial launch of our […]

March 13, 2026 huggingface

Bringing serverless GPU inference to Hugging Face users

Update (November 2024): The integration is no longer available. Please switch to the Hugging Face Inference API, Inference Endpoints, or other deployment options for your AI model needs. Today, we are thrilled to announce the launch of Deploy on Cloudflare Workers AI, a new integration on the Hugging Face Hub. Deploy on Cloudflare Workers AI makes using open models as a serverless API easy, powered by state-of-the-art GPUs deployed in Cloudflare edge data centers. Starting today, we are integrating some […]

March 13, 2026 huggingface

Blazing Fast SetFit Inference with 🤗 Optimum Intel on Xeon

SetFit is a promising solution for a common modeling problem: how to deal with lack of labeled data for training. Developed with Hugging Face’s research partners at Intel Labs and the UKP Lab, SetFit is an efficient framework for few-shot fine-tuning of Sentence Transformers models. SetFit achieves high accuracy with little labeled data – for example, SetFit outperforms GPT-3.5 in 3-shot prompting and with 5 shot it also outperforms 3-shot GPT-4 on the Banking 77 financial intent dataset. Compared to […]

March 13, 2026 huggingface

Text2SQL using Hugging Face Dataset Viewer API and Motherduck DuckDB-NSQL-7B

Today, integrating AI-powered features, particularly leveraging Large Language Models (LLMs), has become increasingly prevalent across various tasks such as text generation, classification, image-to-text, image-to-image transformations, etc. Developers are increasingly recognizing these applications’ potential benefits, particularly in enhancing core tasks such as scriptwriting, web development, and, now, interfacing with data. Historically, crafting insightful SQL queries for data analysis was primarily the domain of data analysts, SQL developers, data engineers, or professionals in related fields, all navigating the nuances of SQL dialect […]

« 1 … 34 35 36 37 38 … 1,021 »