Letting Large Models Debate: The First Multilingual LLM Debate Competition

Current static evaluations and user-driven arenas have exhibited their limitations and biases in the previous year. Here, we explore a novel way to evaluate LLMs: debate.
Debate is an excellent way to showcase reasoning strength and language abilities, used all across history, from the debates in the Athenian Ecclesia in the 5th century BCE to today’s World Universities Debating Championship.
Do today’s large language models exhibit debate skills similar to humans? Which model is currently the best at debating? What can we learn from models when they debate against one another?
To answer this question, BAAI has created a

 

 

 

To finish reading, please visit source site