📚 3LM: A Benchmark for Arabic LLMs in STEM and Code

Why 3LM?

Arabic Large Language Models (LLMs) have seen notable progress in recent years, yet existing benchmarks fall short when it comes to evaluating performance in high-value technical domains. Most evaluations to date have focused on general-purpose tasks like summarization, sentiment analysis, or generic question answering. However, scientific reasoning and programming are essential for a broad range of real-world applications, from education to technical problem-solving.

To address this gap, we introduce 3LM (علم),

To finish reading, please visit source site