RLHF / ChatGPT has been a popular research topic these days. In our quest to research more on RLHF, this blog post attempts to do a reproduction of OpenAI’s 2019 original RLHF codebase at openai/lm-human-preferences. Despite its “tensorflow-1.x-ness,” OpenAI’s original codebase is very well-evaluated and benchmarked, making it a good place to study RLHF implementation engineering details. We aim to: reproduce OAI’s results in stylistic tasks and match the learning curves of openai/lm-human-preferences. present a checklist of implementation details, similar […]
Read more