March 13, 2026 huggingface

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Today, at H Company, we are releasing Holo1, a family of Action Vision Language Models (VLMs) and WebClick, a new multimodal localization benchmark on the Hugging Face Hub. Surfer-H, a web-native

March 13, 2026 huggingface

Real-Time AI Sound Generation on Arm: A Personal Tool for Creative Freedom

By Michael Gamble, Partner & Ecosystem Lead, Arm As a software engineer and music producer, I’m always exploring how technology can expand creative expression. That curiosity recently led me to build a personal sound generation app

March 13, 2026 huggingface

KV Cache from scratch in nanoVLM

We have implemented KV Caching from scratch in our nanoVLM repository (a small codebase to train your own Vision Language Model with pure PyTorch). This gave us a 38% speedup in generation. In this blog post we cover KV Caching and all our experiences while implementing it. The lessons learnt are general and can be applied to all autoregressive language model generations. Implementing from scratch on a small codebase is a great learning experience, come along for the ride!

March 13, 2026 huggingface

Introducing Training Cluster as a Service – a new collaboration with NVIDIA

Today at GTC Paris, we are excited to announce Training Cluster as a Service in collaboration with NVIDIA, to make large GPU clusters more easily accessible for research organizations all over the world, so they can train the foundational models of tomorrow in every domain. Making GPU Clusters Accessible Many Gigawatt-size GPU supercluster projects are being built to train the next gen of AI models. This can make it seem that the compute gap between the “GPU

March 13, 2026 huggingface

Post-Training Isaac GR00T N1.5 for LeRobot SO-101 Arm

NVIDIA Isaac GR00T (Generalist Robot 00 Technology) is a research and development platform for building robot foundation models and data pipelines, designed to accelerate the creation of intelligent, adaptable robots. Today, we announced the availability of Isaac GR00T N1.5, the first major update to Isaac GR00T N1, the world’s first open foundation model for generalized humanoid robot reasoning and skills. This cross-embodiment model processes multimodal inputs, including language and images, to perform manipulation tasks across diverse environments. It is adaptable […]

March 13, 2026 huggingface

Featherless AI on Hugging Face Inference Providers 🔥

We’re thrilled to share that Featherless AI is now a supported Inference Provider on the Hugging Face Hub! Featherless AI joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers. Featherless AI supports a wide variety of text and conversational models, including […]

March 13, 2026 huggingface

🏎️ Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub

Boost your model performance with pre-optimized kernels, easily loaded from the Hub. Today, we’ll explore an exciting development from Hugging Face: the Kernel Hub! As ML practitioners, we know that maximizing performance often involves diving deep into optimized code, custom CUDA kernels, or complex build systems. The Kernel Hub simplifies this process dramatically! Below is a short example of how to use a kernel in your code. import torch from kernels import get_kernel activation = get_kernel(“kernels-community/activation”) x = torch.randn((10, 10), […]

March 13, 2026 huggingface

How Long Prompts Block Other Requests – Optimizing LLM Performance

At TNG, we are self-hosting numerous Large Language Models on our cluster of 24 H100 GPUs. Serving LLMs for over 50 applications, thereby consuming more than 100 million tokens while generating over 10 millions tokens per day, requires us to carefully tune our request processing. In the previous part of our series on LLM performance, we looked into

March 13, 2026 huggingface

Groq on Hugging Face Inference Providers 🔥

We’re thrilled to share that Groq is now a supported Inference Provider on the Hugging Face Hub! Groq joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers. Groq supports a wide variety of text and conversational models, including the latest open-source […]

March 13, 2026 huggingface

(LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware

In our previous post, Exploring Quantization Backends in Diffusers, we dived into how various quantization techniques can shrink diffusion models like FLUX.1-dev, making them significantly more accessible for inference without drastically compromising performance. We saw how bitsandbytes, torchao, and others reduce memory footprints for generating images. Performing inference is cool, but to make these models truly our own, we also need to be able to fine-tune them. Therefore, in this post, we tackle efficient fine-tuning of these models with peak […]

« 1 … 55 56 57 58 59 … 70 »