My Brain CellsMy Brain Cells
HomeBlogAbout

© 2026 My Brain Cells

XGitHubLinkedIn
OLMo

OLMo

AS
Anthony Sandesh

In the rapidly evolving landscape of AI, the term "open-source" has become ambiguous. It often means you get access to the final model weights, but the crucial ingredients—the data, the training code, and the development process—remain a black box.
The Allen Institute for AI (AI2) is challenging this paradigm with OLMo (Open Language Model). OLMo isn't just another model; it's a scientific artifact. It's built on a philosophy of being "fully open," providing the entire recipe, not just the finished cake.
This guide provides a deep dive into the architecture, data, training methodology, and open philosophy that make OLMo a landmark project for AI research and development.

Part 1: The "Fully Open" Philosophy: What It Really Means

AI2's "fully open" approach goes far beyond releasing model weights. It's a commitment to complete transparency and reproducibility, giving the community access to the entire model-building ecosystem.
This includes:
  1. Open Training Data: Full access to the pre-training and post-training datasets. This includes AI2's massive Dolma dataset (an open 3T token corpus) and the specialized Dolmino-Mix used for the latest models.
  1. Open Training Code: The high-performance OLMo-core repository. This isn't a simplified inference script; it's the actual code used to train the models from scratch, including all optimizations.
  1. Open Model Weights: All model weights are released, including the base pre-trained models and the final instruction-tuned variants.
  1. Open Training Logs & Checkpoints: This is a game-changer for researchers. AI2 releases thousands of intermediate checkpoints (snapshots of the model at different points during training). This allows anyone to "rewind the tape" and study how the model learned—a field of study known as "mechanistic interpretability."
  1. Open Evaluation Suite: The full evaluation code (OLMo-Eval) is available, allowing anyone to reproduce OLMo's benchmark results and fairly compare them against other models.
This ecosystem allows any researcher to ask and answer deep questions: "When did the model learn this fact?" "How did this specific data mixture affect its reasoning ability?" "What if we fine-tuned from the 2T token checkpoint instead of the final one?"

Part 2: Technical Deep Dive: The OLMo 2 Architecture

The OLMo 2 family (which includes the 1B, 7B, 13B, and 32B models) is a dense, decoder-only autoregressive Transformer. While based on the foundational Transformer architecture, it incorporates several key modifications for improved performance and training stability.
Here are the specific architectural choices that define OLMo 2:
  • No Biases: Following modern best practices, all Linear and LayerNorm layers are implemented without bias terms, which can improve training stability.
  • SwiGLU Activation: Instead of the standard ReLU, OLMo 2 uses the SwiGLU activation function in its feed-forward network (FFN) block. This has been shown to improve performance. The FFN's hidden size is set to $\frac{8}{3}d_{\text{model}}$ (where $d_{\text{model}}$ is the model's hidden dimension) and rounded to the nearest multiple of 128 for efficiency.
  • Rotary Position Embeddings (RoPE): It uses RoPE for its position embeddings, which has become the standard for high-performance models. OLMo 2 sets the RoPE $\theta$ (theta) parameter to 500,000, which helps the model handle long-context reasoning.
  • RMSNorm: It uses RMSNorm (Root Mean Square Normalization) instead of the standard LayerNorm. RMSNorm is simpler and computationally cheaper, and has been shown to be just as effective.
  • Reordered Normalization: The LayerNorm is applied after the attention and FFN blocks, rather than before them (as in a pre-norm architecture).
  • QK-Norm: To prevent large values in the attention logits that can lead to training instability, OLMo 2 applies RMSNorm to the query (Q) and key (K) vectors before the dot-product attention calculation.
  • Z-Loss: A "z-loss" regularizer is added to the training objective. This helps keep the logit values (the model's raw outputs before the final softmax) from growing too large, which further aids in training stability.
OLMo 2 32B Specifics:
  • Parameters: 32.2B
  • Layers: 64
  • Hidden Size ($d_{\text{model}}$): 5120
  • Attention Heads: 40
  • Context Length: 4096 tokens

Part 3: The Engine: Training and Data

A model is only as good as its data and training. OLMo 2's training is a sophisticated multi-stage process.

Stage 1: Pre-training

The model is first trained on a massive, general-purpose dataset to learn language, facts, and basic reasoning.
  • Dataset: OLMo-Mix-1124
  • Size: 3.9T (trillion) tokens
  • Content: A diverse mix of data, including AI2's own Dolma dataset, plus Starcoder (for code) and Proof Pile II (for mathematical reasoning).
  • Epochs: The 32B model was trained for ~1.5 epochs, totaling 6T tokens of data exposure.

Stage 2: Mid-training (or "Annealing")

After pre-training, the model is "specialized" on a smaller, extremely high-quality data mixture. This phase, which uses a smaller learning rate, is crucial for improving downstream task performance.
  • Dataset: Dolmino-Mix-1124
  • Content: This is a curated blend of high-quality web data, academic/scientific papers (peS2o), question-answering datasets (FLAN), and instruction-following data. It's designed to "patch" the model's capabilities in key areas.
  • Technique: Model Soup. The OLMo 2 32B model wasn't just trained on the Dolmino-Mix once. It was trained 3 separate times on a 100B token mix and 1 time on a 300B token mix, each from the same pre-trained checkpoint. The final base model is a simple average of the weights from these separate runs. This "model souping" technique is known to improve robustness and general performance.

Stage 3: Post-training (Creating the "Instruct" Model)

The base model is a powerful text-completion engine, but it's not a helpful assistant. To create the OLMo-2-Instruct models, AI2 applies a state-of-the-art alignment recipe based on its Tülu 3.1 framework.
This process has three steps:
  1. Supervised Fine-Tuning (SFT): The model is fine-tuned on a high-quality dataset of instruction-response pairs (e.g., "Question: Who was... Answer: ..."). This teaches the model the format of following instructions and being a helpful assistant.
  1. Direct Preference Optimization (DPO): After SFT, the model learns human preferences. It is shown a prompt and two possible answers, one "chosen" (preferred) and one "rejected." DPO's algorithm efficiently teaches the model to increase the probability of generating responses like the "chosen" ones and decrease the probability of "rejected" ones.
  1. Reinforcement Learning with Verifiable Rewards (RLVR): This is the final, advanced step. Instead of using a fallible AI reward model (as in traditional RLHF), RLVR fine-tunes the model on tasks where correctness can be objectively verified. For example, in a math problem, the model's final answer can be checked against the correct solution. This provides a "gold standard" reward signal, robustly improving the model's factual accuracy and complex reasoning skills.

Part 4: How to Use OLMo (The Practical Guide)

You can run, fine-tune, and analyze OLMo right now.

1. Quick Inference with Hugging Face

This is the fastest way to try the model.
 

2. Analyzing Intermediate Checkpoints

This is where the research power of OLMo shines. You can load any checkpoint from the training run by specifying the revision.
The checkpoints are named by step and tokens, e.g., step250000-tokens2098B.
 
You can now compare this model's outputs and internal states to the final model, allowing for deep analysis of its learning trajectory.

3. Fine-Tuning with OLMo-core

For serious fine-tuning or pre-training from scratch, you'll want to use the official OLMo-core repository.
 
The OLMo-core repository is highly optimized for large-scale training on multi-GPU, multi-node clusters and gives you full control over every hyperparameter, just as the AI2 team had.

Conclusion

OLMo is not just a product; it's a new standard for open science in AI. By providing every component of its creation, AI2 has given the world a powerful tool, a rich dataset for analysis, and an open invitation to build the next generation of AI models together. Whether you are a developer looking for a strong base model, a researcher studying the fundamentals of deep learning, or an enthusiast curious about what's inside an LLM, the OLMo project has something for you.

More posts

MCP Deep Dive: A Simple (but Detailed) Guide

MCP Deep Dive: A Simple (but Detailed) Guide

8-Stage Lifecycle of Modern LLM Applications

8-Stage Lifecycle of Modern LLM Applications

“Verl" for LLM Reinforcement Learning (Beyond Pre-training)

“Verl" for LLM Reinforcement Learning (Beyond Pre-training)

“Verl" for LLM Reinforcement Learning (Beyond Pre-training)

Newer

“Verl" for LLM Reinforcement Learning (Beyond Pre-training)

DeepSpeed

Older

DeepSpeed

On this page

  1. Part 1: The "Fully Open" Philosophy: What It Really Means
  2. Part 2: Technical Deep Dive: The OLMo 2 Architecture
  3. Part 3: The Engine: Training and Data
  4. Stage 1: Pre-training
  5. Stage 2: Mid-training (or "Annealing")
  6. Stage 3: Post-training (Creating the "Instruct" Model)
  7. Part 4: How to Use OLMo (The Practical Guide)
  8. 1. Quick Inference with Hugging Face
  9. 2. Analyzing Intermediate Checkpoints
  10. 3. Fine-Tuning with OLMo-core
  11. Conclusion