Series: The Learn Arc โ 50 posts teaching Active Inference through a live BEAM-native workbench. โ Part 16: Session 2.2. This is Part 17.
The session
Chapter 2, ยง3. Session title: The cost of being wrong. Route: /learn/session/2/s3_cost_of_being_wrong.
In Session 2.2 we derived the bound. Session 2.3 asks the engineer's question: when the model is wrong, where does that wrongness land? Knowing the answer tells you which wrongness to accept and which to fix.
The decomposition
Free energy splits two ways. The book writes both; each gives you a different intuition.
Split 1 โ complexity + accuracy:
F[Q, o] = KL( Q(s) || P(s) ) โ E_Q[ log P(o|s) ]
โโโโโ complexity โโโโโ โโโโโ accuracy โโโโโ
Complexity penalises Q for straying from the prior. Accuracy rewards explaining the observation. Minimize the sum โ you get a Q that explains the data while staying close to what you believed before. It's the Occam's razor of Bayesian inference, written in nats.
Split 2 โ energy + entropy:
F[Q, o] = E_Q[ โlog P(o,s) ] โ H[Q]
โโโโโ energy โโโโโ โโโ entropy โโ
Energy pulls Q toward high-density regions of the joint; entropy pushes Q to stay spread. The equilibrium is the posterior.
Both splits are the same functional. They're two lenses on one number.
Where wrongness lives
With the decomposition in hand, you can answer the engineer's question:
If your prior P(s) is wrong โ the complexity term lies. Q will drift away from the true posterior toward your wrong prior. Cost: biased inference. Remedy: update the prior (Chapter 7 Dirichlet learning does this automatically).
If your likelihood P(o|s) is wrong โ the accuracy term lies. Q will explain observations under a false sensor model. Cost: systematically over- or under-confident beliefs. Remedy: Dirichlet-learn the A matrix, or widen its prior variance.
If your variational family Q is too narrow โ the KL-divergence residual from Session 2.2 stays positive even after convergence. Cost: irreducible gap between Q and the true posterior. Remedy: richer family (mean-field โ structured โ particle filter).
Every practical Active Inference system tells you which of these three is costing the most. The Workbench uses the Glass signal river and the equation-by-equation provenance to let you attribute error by line of math.
The complexity/accuracy recipe
/cookbook/vfe-decompose-complexity-accuracy runs an agent and logs both terms per tick. You watch them trade off:
- Early ticks โ accuracy dominates (a new observation means large
โlog P(o|s)until beliefs update). - Mid run โ the terms equalize (the agent has balanced its evidence against its prior).
- Convergence โ complexity dominates (the remaining residual is mostly "how far Q has drifted from P(s)").
Two lines on one chart. The chart is the diagnostic.
The mental move
Free energy is not one thing. It's a sum of costs, and different bad-model pathologies load onto different summands. When your agent behaves weirdly, decompose F and the weirdness localizes. That is the difference between theory and working engineering.
Why this matters for Chapter 9
We'll return to this in full at Part 29 (Session 9.1). When you're fitting an Active Inference model to human behavioral data, the final F value is your log Bayes factor. But to debug the fit โ to know whether your model is systematically wrong or only contingently noisy โ you need the decomposition from this session. Chapter 2 plants the tool; Chapter 9 uses it.
The concepts this session surfaces
- Complexity โ KL from posterior to prior; the "how far did Q drift" cost.
- Accuracy โ expected log-likelihood; the "how well do you explain the data" reward.
-
Energy โ
E_Q[โlog P(o,s)]; the joint-model cost. -
Entropy โ
H[Q]; the belief's spread.
The quiz
Q: An agent's free energy stays high after many ticks of observation. The complexity term is tiny; the accuracy term is large. What's wrong?
- โ The variational family is too narrow.
- โ The likelihood P(o|s) is mis-specified. โ
- โ The prior P(s) is wrong.
- โ The softmax temperature is too low.
Why: Large accuracy means the agent's Q can't explain observations. If complexity is small, Q is close to P(s), so the drift from the prior isn't the issue. The likelihood model is predicting observations that don't match reality.
Run it yourself
-
/learn/session/2/s3_cost_of_being_wrongโ session page. -
/cookbook/vfe-decompose-complexity-accuracyโ watch the terms split live. -
/cookbook/perception-noisy-sensor-robustnessโ how the accuracy term reacts to a bad likelihood. -
/cookbook/perception-sweep-iteration-budgetโ what happens if you stopFminimization early. -
/equationsโ VFE family, with both decompositions listed.
Next
Part 18: Session ยง2.4 โ Action as inference. Chapter 2's fourth and final session. The move that flips free energy from "posterior machinery" to "policy machinery" โ and sets up Chapter 3's Expected Free Energy. We see what happens when you let the observation be a free variable.
โญ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench ยท MIT license
๐ Active Inference, Parr, Pezzulo, Friston โ MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference
โ Part 16: Session 2.2 ยท Part 17: Session 2.3 (this post) ยท Part 18: Session 2.4 โ coming soon









