Global MMLU

Human-verified multilingual evaluation at scale

Accelerating multilingual AI development through comprehensive, bias-aware evaluation.

Image for Aesthetics Purposes

Responsible Translation

We use multiple translation approaches with human verification to ensure fair and accurate language representation for all languages.

Inclusive Coverage

We deliberately include high, mid, and low-resource languages to expose performance gaps and ensure no communities are left behind in AI evaluation.

Cultural Expertise

We classify questions by cultural sensitivity to expose bias and ensure models work equally well across all communities and contexts.

New Release

Global MMLU Lite V3

Our latest release adds six new languages to Global MMLU Lite, perfect for teams needing quick, accurate multilingual assessment.
  • Wider Coverage: Now featuring Oriya, Hungarian, Tajik, Slovak, Czech and Italian for even broader representation.

  • Fast, Fair Evaluation: Reliable performance metrics across 23 languages with our streamlined 6,000-sample dataset

  • Human Verified: Test model performance across diverse linguistic contexts with our lightweight, human-verified benchmark.

Image for Aesthetics Purposes

Global MMLU

Unlock the full potential of your multilingual AI with complete dataset evaluation across 42 languages.

  • Complete Coverage: 42 languages with 589,764 samples for thorough evaluation.

  • Research-Grade Depth: Full dataset with human+machine translated samples.

  • Production-Ready: Used by frontier labs to test models for real-world deployment.

Image for Aesthetics Purposes

Global MMLU continuously evolves through open science collaboration with independent contributors and partner organizations worldwide, incorporating cultural expertise to create a fairer, more comprehensive benchmark.

+

Collaborators

+

Total annotations