OLMo 3 Evaluation Task Clusters#
This section lists the evaluation task clusters for OLMo 3 base models and post-trained Think/Instruct models.
OlmoBaseEval task clusters (base model pretraining):
MC STEM
MC Non-STEM
GenQA
Math
Code
Code Fill-in-the-Middle (FIM)
Think/Instruct task clusters (olmo3:adapt):
Knowledge
Reasoning
Math
Coding
Chat / Instruction Following