March 13, 2026 huggingface

Putting RL back in RLHF

We are excited to introduce the RLOO (REINFORCE Leave One-Out) Trainer in TRL. As an alternative to PPO, RLOO is a new online RLHF training algorithm designed to be more accessible and easier to implement. In

To finish reading, please visit source site