PipelineRL

We are excited to open-source PipelineRL, an experimental RL implementation that tackles a fundamental challenge in large-scale Reinforcement Learning with LLMs: the trade-off between inference throughput and on-policy data collection. PipelineRL’s key innovation is inflight weight updates during RL training (see Figure 1 below). This allows PipelineRL to achieve constantly high inference throughput and minimize the lag between the weights used for rollouts and the most recently updated model weights. The result: fast and stable RL training for large language […]

Read more

What is AutoRound?

As large language models (LLMs) and vision-language models (VLMs) continue to grow in size and complexity, deploying them efficiently becomes increasingly challenging. Quantization offers a solution by reducing model size and inference latency. Intel’s AutoRound emerges as a cutting-edge quantization tool that balances accuracy, efficiency, and compatibility. AutoRound is a weight-only post-training quantization (PTQ) method developed by Intel. It uses signed gradient descent to jointly optimize weight rounding and clipping ranges, enabling accurate low-bit quantization (e.g., INT2 – INT8) with […]

Read more

The Transformers Library: standardizing model definitions

TLDR: Going forward, we’re aiming for Transformers to be the pivot across frameworks: if a model architecture is supported by transformers, you can expect it to be supported in the rest of the ecosystem. Transformers was created in 2019, shortly following the release of the BERT Transformer model. Since then, we’ve continuously aimed to add state-of-the-art architectures, initially focused on NLP, then growing to Audio and computer vision. Today, transformers is the default library for LLMs and VLMs in the […]

Read more

Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models.

In this blogpost, we present the key highlights and rationales about the Falcon-Edge series – a collection of powerful, universal, and fine-tunable language models available in ternary format, based on the BitNet architecture. Drawing from our experience with BitNet, Falcon-Edge introduces and validates an new pre-training paradigm that delivers a full-scope output from a single training process, simultaneously yielding both non-quantized and quantized model variants. This comprehensive approach produces a non-BitNet model in bfloat16 format, the native BitNet model, and […]

Read more

Microsoft and Hugging Face expand collaboration to make open models easy to use on Azure

Today at the Microsoft Build conference, Satya Nadella announced an expanded collaboration with Hugging Face, to make its wide diversity of open models easy to deploy on Azure secure infrastructure. If you head over to Azure AI Foundry today, you will find a vastly expanded collection of 10,000+ Hugging Face models you can deploy in a couple clicks to power AI applications working with text, audio and images. And we’re just getting started!

Read more
1 53 54 55 56 57 70