Illustrating Reinforcement Learning from Human Feedback (RLHF)
This article has been translated to Chinese 简体中文 and Vietnamese đọc tiếng việt.
Language models have shown impressive capabilities in the past few years by generating diverse and compelling text from human input prompts. However, what makes a “good” text is inherently hard to define as it is subjective and context dependent. There are many applications such as writing stories where you want creativity, pieces of informative text which should be truthful, or code snippets that we want to be executable.
Writing a loss function to capture these attributes seems intractable and most language models