📚 3LM: A Benchmark for Arabic LLMs in STEM and Code


image/png



Why 3LM?

Arabic Large Language Models (LLMs) have seen notable progress in recent years, yet existing benchmarks fall short when it comes to evaluating performance in high-value technical domains. Most evaluations to date have focused on general-purpose tasks like summarization, sentiment analysis, or generic question answering. However, scientific reasoning and programming are essential for a broad range of real-world applications, from education to technical problem-solving.

To address this gap, we introduce 3LM (علم),

 

 

 

To finish reading, please visit source site