NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates
We’re happy to introduce the NPHardEval leaderboard, using NPHardEval, a cutting-edge benchmark developed by researchers from the University of Michigan and Rutgers University.
NPHardEval introduces a dynamic, complexity-based framework for assessing Large Language Models’ (LLMs) reasoning abilities. It poses 900 algorithmic questions spanning the NP-Hard complexity class and lower, designed to rigorously test LLMs, and is updated on a monthly basis to prevent overfitting!
A Unique Approach to LLM Evaluation
NPHardEval stands apart