TimeScope: How Long Can Your Video Large Multimodal Model Go?

TimeScope is an open-source benchmark designed to measure how well vision-language models understand long videos. By adding short “needle” clips into videos ranging from 1 minute to 8 hours, it evaluates three skills: localized retrieval, information synthesis, fine-grained temporal perception. Timescope reveals that many state-of-the-art models still struggle with true temporal comprehension. Table of Contents Recent advances in multimodal AI have produced models claiming to understand hour-long videos. This trend mirrors progress in long-context language models,    

Read more

Parquet Content-Defined Chunking

Reduce Parquet file upload and download times on Hugging Face Hub by leveraging the new Xet storage layer and Apache Arrow’s Parquet Content-Defined Chunking (CDC) feature enabling more efficient and scalable data workflows. TL;DR: Parquet Content-Defined Chunking (CDC) is now available in PyArrow and Pandas, enabling efficient deduplication of Parquet files on content-addressable storage systems like Hugging Face’s Xet storage layer. CDC    

Read more

Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face

TL;DR: Trackio is a new, open-source, and free experiment tracking Python library that provides a local dashboard and seamless integration with Hugging Face Spaces for easy sharing and collaboration. Since trackio is a drop-in replacement for wandb, you can get started with the syntax you already know! Background If you have trained your own machine learning model, you know how important it is to be able to track metrics, parameters, and hyperparameters during training and visualize    

Read more

📚 3LM: A Benchmark for Arabic LLMs in STEM and Code

Why 3LM? Arabic Large Language Models (LLMs) have seen notable progress in recent years, yet existing benchmarks fall short when it comes to evaluating performance in high-value technical domains. Most evaluations to date have focused on general-purpose tasks like summarization, sentiment analysis, or generic question answering. However, scientific reasoning and programming are essential for a broad range of real-world applications, from education to technical problem-solving. To address this gap, we introduce 3LM (علم),    

Read more

Measuring Open-Source Llama Nemotron Models on DeepResearch Bench

Contributors: David Austin, Raja Biswas, Gilberto Titericz Junior, NVIDIA NVIDIA’s AI-Q Blueprint—the leading portable, open deep research agent—recently climbed to the top of the Hugging Face “LLM with Search” leaderboard on DeepResearch Bench. This is a significant step forward for the open-source AI stack, proving that developer-accessible models can power advanced agentic workflows that rival or surpass closed alternatives.    

Read more

Welcome GPT OSS, the new open-source model family from OpenAI!

GPT OSS is a hugely anticipated open-weights release by OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases. It comprises two models: a big one with 117B parameters (gpt-oss-120b), and a smaller one with 21B parameters (gpt-oss-20b). Both are mixture-of-experts (MoEs) and use a 4-bit quantization scheme (MXFP4), enabling fast inference (thanks to fewer active parameters, see details below) while keeping resource usage low. The large model fits on a single H100 GPU, while the small one […]

Read more

Vision Language Model Alignment in TRL ⚡️

Vision Language Models (VLMs) are getting stronger, but aligning them to human preferences still matters. In TRL, we already showed how to post-train VLMs with Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). This time, we’re going further. tl;dr Here’s what’s new in TRL: Mixed Preference Optimization (MPO) Group Relative Policy Optimization (GRPO) Group Sequence Policy Optimization (GSPO) (a variant of GRPO) These go beyond pairwise DPO, extracting richer signals from preference data and scaling better with modern VLMs. We’ve […]

Read more

Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training

Training large models across multiple GPUs can be challenging due to the complexities of different parallelism strategies. In Accelerate, together with Axolotl, we have integrated a quick and easy way to use any combination of parallelism strategies in your training script! Here is how to add it to your training script: from transformers import AutoModelForCausalLM from accelerate import Accelerator from accelerate.parallelism_config import ParallelismConfig from accelerate.utils import FullyShardedDataParallelPlugin pc = ParallelismConfig( dp_shard_size=2, dp_replicate_size=2, cp_size=2, tp_size=2, ) fsdp_plugin = FullyShardedDataParallelPlugin( fsdp_version=2, auto_wrap_policy=”transformer_based_wrap”, […]

Read more
1 58 59 60 61 62 70