Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Sentence Transformers is a Python library for using and training embedding and reranker models for applications like retrieval augmented generation, semantic search, and more. In my previous blogpost, I introduced the new multimodal capabilities, showing how to use embedding and reranker models that handle text, images, audio, and video. In this blogpost, I’ll show you how to train or finetune these multimodal models on your own data. As a practical example, I’ll walk through finetuning Qwen/Qwen3-VL-Embedding-2B for Visual Document Retrieval […]

Read more

The PR you would have opened yourself

Making transformers models available in mlx-lm using a Skill and test harness TL;DR We provide a Skill and a test harness to help port language models from transformers to mlx-lm, so they become (almost) instantly available the moment they are added to transformers. The Skill is designed to support contributors and reviewers as an aide, not an automation. We explain why we did it, how, and comment about how to meaningfully contribute to open source in the age of agents. […]

Read more

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

TL;DR — We extend the RLVE framework from single-turn reasoning puzzles to multi-turn, tool-augmented e-commerce conversations. EcomRLVE-GYM provides 8 verifiable environments — product discovery, substitution, cart building, returns, order tracking, policy QA, bundle planning, and multi-intent journeys — each with procedural problem generation, a 12-axis difficulty curriculum, and algorithmically verifiable rewards. We train a Qwen 3 8B model with DAPO over 300 steps and present early results demonstrating that environment scaling and adaptive difficulty transfer to agentic, real-world task completion. […]

Read more

Building a Fast Multilingual OCR Model with Synthetic Data

Training a high-quality OCR model requires a large quantity of annotated image-text pairs: images with precise bounding boxes, transcriptions, and ideally reading order information at the word, line, and paragraph level. Every approach to curating this data comes with tradeoffs. Existing benchmark datasets like ICDAR and Total-Text have clean labels but limited scale, typically tens of thousands of images skewed toward English and Chinese. Manual annotation produces the highest quality labels but is expensive and slow, making it impractical at […]

Read more

Training mRNA Language Models Across 25 Species for $165

By OpenMed, Open-Source Agentic AI for Healthcare & Life Sciences TL;DR: We built an end-to-end protein AI pipeline covering structure prediction, sequence design, and codon optimization. After comparing multiple transformer architectures for codon-level language modeling, CodonRoBERTa-large-v2 emerged as the clear winner with a perplexity of 4.10 and a Spearman CAI correlation of 0.40, significantly outperforming ModernBERT. We then scaled to 25 species, trained 4 production models in 55 GPU-hours, and built a species-conditioned system that no other open-source project offers. […]

Read more

gradio.Server: Any Custom Frontend with Gradio’s Backend

A few weeks ago, we wrote about one-shotting full web apps with gr.HTML: building rich, interactive frontends entirely inside Gradio using custom HTML, CSS, and JavaScript. That unlocked a lot. But what if that’s not enough? What if you want to build with your own frontend framework entirely like React, Svelte, or even plain HTML/JS, while still benefiting from Gradio’s queuing system, API infrastructure, MCP support, and ZeroGPU on Spaces? That’s exactly the problem gradio.Server solves. And it changes what’s […]

Read more

Safetensors is Joining the PyTorch Foundation

Today, we’re announcing that Safetensors has joined the PyTorch Foundation as a foundation-hosted project under the Linux Foundation, alongside DeepSpeed, Helion, Ray, vLLM, and PyTorch itself. How we got here Safetensors started as a Hugging Face project born out of a concrete need: a way to store and share model weights that couldn’t execute arbitrary code. The pickle-based formats that dominated the ecosystem at the time meant that there was a very real risk you’d be running malicious code. While […]

Read more

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Most AI agents re‑read transcripts instead of learning principles, so they repeat mistakes and don’t transfer lessons to new situations. ALTK‑Evolve turns raw agent trajectories into reusable guidelines. In benchmarks, the approach boosted reliability, especially on hard (Δ 14.2% on AppWorld), multi‑step tasks, without bloating context. The “eternal intern” problem Imagine a brilliant line cook who has memorized every cookbook but forgets your kitchen every morning. They don’t remember your oven runs hot, or that regulars like extra salt; they’ll […]

Read more

Multimodal Embedding & Reranker Models with Sentence Transformers

Sentence Transformers is a Python library for using and training embedding and reranker models for applications like retrieval augmented generation, semantic search, and more. With the v5.4 update, you can now encode and compare texts, images, audio, and videos using the same familiar API. In this blogpost, I’ll show you how to use these new multimodal capabilities for both embedding and reranking. Multimodal embedding models map inputs from different modalities into a shared embedding space, while multimodal reranker models score […]

Read more

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Waypoint-1.5 Weights on the Hub Try it What is Waypoint-1.5? Waypoint-1.5 is Overworld’s next real-time video world model, built to bring interactive generative worlds to the hardware people actually own. The first release of Waypoint showed that real-time generative worlds were possible. It proved that interactive world models could be more than passive video demos, and that locally runnable systems could begin to close the gap between generating a world and actually stepping into one. Waypoint-1.5 builds directly on that […]

Read more
1 2 3 4 74