Global MMLU
Human-verified multilingual evaluation at scale
Accelerating multilingual AI development through comprehensive, bias-aware evaluation.
Responsible Translation
Inclusive Coverage
Cultural Expertise
New Release
Global MMLU Lite V3
Wider Coverage: Now featuring Oriya, Hungarian, Tajik, Slovak, Czech and Italian for even broader representation.
Fast, Fair Evaluation: Reliable performance metrics across 23 languages with our streamlined 6,000-sample dataset
Human Verified: Test model performance across diverse linguistic contexts with our lightweight, human-verified benchmark.
Global MMLU
Unlock the full potential of your multilingual AI with complete dataset evaluation across 42 languages.
Complete Coverage: 42 languages with 589,764 samples for thorough evaluation.
Research-Grade Depth: Full dataset with human+machine translated samples.
Production-Ready: Used by frontier labs to test models for real-world deployment.
Global MMLU continuously evolves through open science collaboration with independent contributors and partner organizations worldwide, incorporating cultural expertise to create a fairer, more comprehensive benchmark.
Collaborators
Total annotations
