Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

Philipp Schmid's avatar

This post was written by Philipp Schmid and orginially posted on philschmid.de code can found here.

The release of Deepseek R1 shocked the industry. Why? Well, DeepSeek-R1 is an open model that rivals OpenAI’s o1 in complex reasoning tasks, introduced using Group Relative Policy Optimization (GRPO) and RL-focused multi-stage training approach. They not only released the

 

 

 

To finish reading, please visit source site