Introducing the Open Chain of Thought Leaderboard
Chain-of-thought prompting is emerging as a powerful and effective design pattern for LLM-based apps and agents. The basic idea of chain-of-thought prompting is to let a model generate a step-by-step solution (“reasoning trace”) before answering a question or taking a decision. With the Open CoT Leaderboard we’re tracking LLMs’ ability to generate effective chain-of-thought traces for challenging reasoning tasks.
Unlike most performance based leaderboards, we’re not scoring the absolute accuracy a model achieves on a given task, but the difference between the accuracy with and without chain-of-thought prompting:
accuracy gain Δ = accuracy with CoT – accuracy w/o