Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models
Numina & Kimi Team
Figure 1: Performance comparison of theorem proving models on the miniF2F-test dataset.
We’re excited to announce the release of Kimina-Prover-72B, our state-of-the-art theorem proving model trained with the Kimi k1.5[1] RL pipeline based on Qwen2.5-72B [2]. Alongside it, we are also releasing two distilled variants: Kimina-Prover-Distill-8B and 1.7B (based on Qwen3-8B and Qwen3-1.7B[3] respectively).
Our key innovations include:
-
Test-Time Reinforcement Learning Search: A trainable agentic proving framework that enables the model to recursively discover, combine and
