Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial
This post was written by Philipp Schmid and orginially posted on philschmid.de code can found here.
The release of Deepseek R1 shocked the industry. Why? Well, DeepSeek-R1 is an open model that rivals OpenAI’s o1 in complex reasoning tasks, introduced using Group Relative Policy Optimization (GRPO) and RL-focused multi-stage training approach. They not only released the