March 13, 2026 huggingface

Transformers backend integration in SGLang

Hugging Face transformers library is the standard for working with state-of-the-art models — from experimenting with cutting-edge research to fine-tuning on custom data. Its simplicity, flexibility, and expansive model zoo make it a powerful tool for rapid development. But once you’re ready to move from notebooks to production, inference performance becomes mission-critical. That’s where SGLang comes in. Designed for high-throughput, low-latency inference, SGLang now offers seamless integration with transformers as a backend. This means you can pair the flexibility of […]

March 13, 2026 huggingface

Gemma 3n fully available in the open-source ecosystem!

Gemma 3n was announced as a preview during Google I/O. The on-device community got really excited, because this is a model designed from the ground up to run locally on your hardware. On top of that, it’s natively multimodal, supporting image, text, audio, and video inputs 🤯 Today, Gemma 3n is finally available on the most used open source libraries. This includes transformers & timm, MLX, llama.cpp (text inputs), transformers.js, ollama, Google AI Edge, and others. This post quickly goes […]

March 13, 2026 huggingface

Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub

NVIDIA Llama Nemotron Nano VL is a state-of-the-art 8B Vision Language Model (VLM) designed for intelligent document processing, offering high accuracy and multimodal understanding. Available on Hugging Face, it excels in extracting and understanding information from complex documents like invoices, receipts, contracts, and more. With its powerful OCR capabilities and efficient performance on the OCRBench v2 benchmark, this model delivers industry-leading accuracy for text and table extraction, as well as chart, diagram, and table parsing. Whether you’re automating financial document […]

March 13, 2026 huggingface

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Sentence Transformers is a Python library for using and training embedding and reranker models for a wide range of applications, such as retrieval augmented generation, semantic search, semantic textual similarity, paraphrase mining, and more. The last few

March 13, 2026 huggingface

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

Join us in building benchmarks that capture early-stage reasoning & scientific knowledge in LLMs! The development of Large Language Models (LLMs) typically begins with a series of ablation experiments, wherein various model architectures, data mixtures, and training hyperparameters are systematically evaluated. This phase is commonly referred to as the early stages of training. During this period, researchers primarily monitor two key metrics: the training loss curve and evaluation scores. However, existing evaluation benchmarks often fail to provide meaningful or discriminative […]

March 13, 2026 huggingface

Efficient MultiModal Data Pipeline

You’ve got everything ready – data, model, a beefy GPU setup. You hit “run” and… wait. And wait some more. Your GPUs are barely breaking a sweat while your wallet’s getting lighter by the hour. Sound familiar? We’ve been there. After some detective work on our nanoVLM project, we discovered the real culprit wasn’t our model or hardware, it was our data pipeline being incredibly wasteful. Here’s what we found: Idle GPUs: Our model was literally waiting around for data […]

March 13, 2026 huggingface

SmolLM3: smol, multilingual, long-context reasoner

Small language models are becoming increasingly important as users seek capable models that can be deployed efficiently. The community has produced a fascinating range of capable small models, each pushing the boundaries of what’s possible at this scale. With SmolLM3, we’re excited to contribute a new competitive fully open 3B model: SmolLM3 sits in the efficiency sweet spot. Our 3B model outperforms Llama-3.2-3B and Qwen2.5-3B while staying competitive with larger 4B alternatives (Qwen3 & Gemma3). Beyond the performance numbers, we’re […]

March 13, 2026 huggingface

Upskill your LLMs with Gradio MCP Servers

Upskill your LLMs With Gradio MCP Servers Have you ever wanted your favorite Large Language Model (LLM) to do more than just answer questions? What if it could edit images for you, browse the web, or organize your email inbox? Well, now it can! In this blog post, I’ll show you: What the MCP protocol is and how it works similarly to

March 13, 2026 huggingface

Creating custom kernels for the AMD MI300

More than a billion per day: that’s a low estimate of how many requests ChatGPT handles daily, a number which is unlikely to go down soon. For each request and each

March 13, 2026 huggingface

Reachy Mini – The Open-Source Robot for Today’s and Tomorrow’s AI Builders

Tiny price, small size, huge possibilities. Code, learn, share with AI builders of all ages, all around the globe. Reachy Mini is an expressive, open-source robot designed for human-robot interaction, creative coding, and AI experimentation. Fully programmable in

« 1 … 56 57 58 59 60 … 70 »