OLMo 3 Evaluation Task Clusters

OLMo 3 Evaluation Task Clusters#

This section lists the evaluation task clusters for OLMo 3 base models and post-trained Think/Instruct models.

OlmoBaseEval task clusters (base model pretraining):

  • MC STEM

  • MC Non-STEM

  • GenQA

  • Math

  • Code

  • Code Fill-in-the-Middle (FIM)

Think/Instruct task clusters (olmo3:adapt):

  • Knowledge

  • Reasoning

  • Math

  • Coding

  • Chat / Instruction Following