Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models

Numina & Kimi Team

Figure 1: Performance comparison of theorem proving models on the miniF2F-test dataset.

We’re excited to announce the release of Kimina-Prover-72B, our state-of-the-art theorem proving model trained with the Kimi k1.5[1] RL pipeline based on Qwen2.5-72B [2]. Alongside it, we are also releasing two distilled variants: Kimina-Prover-Distill-8B and 1.7B (based on Qwen3-8B and Qwen3-1.7B[3] respectively).

Our key innovations include:

Test-Time Reinforcement Learning Search: A trainable agentic proving framework that enables the model to recursively discover, combine and

To finish reading, please visit source site