Active Inference, The Learn Arc — Part 17: Session §2.3 — The Cost of Being Wrong

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 16: Session 2.2. This is Part 17.

The session

Chapter 2, §3. Session title: The cost of being wrong. Route: /learn/session/2/s3_cost_of_being_wrong.

In Session 2.2 we derived the bound. Session 2.3 asks the engineer's question: when the model is wrong, where does that wrongness land? Knowing the answer tells you which wrongness to accept and which to fix.

The decomposition

Free energy splits two ways. The book writes both; each gives you a different intuition.

Split 1 — complexity + accuracy:

F[Q, o]  =  KL( Q(s)  ||  P(s) )  −  E_Q[ log P(o|s) ]
           └──── complexity ────┘   └──── accuracy ────┘

Complexity penalises Q for straying from the prior. Accuracy rewards explaining the observation. Minimize the sum → you get a Q that explains the data while staying close to what you believed before. It's the Occam's razor of Bayesian inference, written in nats.

Split 2 — energy + entropy:

F[Q, o]  =  E_Q[ −log P(o,s) ]  −  H[Q]
           └──── energy ────┘   └── entropy ─┘

Energy pulls Q toward high-density regions of the joint; entropy pushes Q to stay spread. The equilibrium is the posterior.

Both splits are the same functional. They're two lenses on one number.

Where wrongness lives

With the decomposition in hand, you can answer the engineer's question:

If your prior P(s) is wrong — the complexity term lies. Q will drift away from the true posterior toward your wrong prior. Cost: biased inference. Remedy: update the prior (Chapter 7 Dirichlet learning does this automatically).

If your likelihood P(o|s) is wrong — the accuracy term lies. Q will explain observations under a false sensor model. Cost: systematically over- or under-confident beliefs. Remedy: Dirichlet-learn the A matrix, or widen its prior variance.

If your variational family Q is too narrow — the KL-divergence residual from Session 2.2 stays positive even after convergence. Cost: irreducible gap between Q and the true posterior. Remedy: richer family (mean-field → structured → particle filter).

Every practical Active Inference system tells you which of these three is costing the most. The Workbench uses the Glass signal river and the equation-by-equation provenance to let you attribute error by line of math.

The complexity/accuracy recipe

/cookbook/vfe-decompose-complexity-accuracy runs an agent and logs both terms per tick. You watch them trade off:

Early ticks — accuracy dominates (a new observation means large −log P(o|s) until beliefs update).
Mid run — the terms equalize (the agent has balanced its evidence against its prior).
Convergence — complexity dominates (the remaining residual is mostly "how far Q has drifted from P(s)").

Two lines on one chart. The chart is the diagnostic.

The mental move

Free energy is not one thing. It's a sum of costs, and different bad-model pathologies load onto different summands. When your agent behaves weirdly, decompose F and the weirdness localizes. That is the difference between theory and working engineering.

Why this matters for Chapter 9

We'll return to this in full at Part 29 (Session 9.1). When you're fitting an Active Inference model to human behavioral data, the final F value is your log Bayes factor. But to debug the fit — to know whether your model is systematically wrong or only contingently noisy — you need the decomposition from this session. Chapter 2 plants the tool; Chapter 9 uses it.

The concepts this session surfaces

Complexity — KL from posterior to prior; the "how far did Q drift" cost.
Accuracy — expected log-likelihood; the "how well do you explain the data" reward.
Energy — E_Q[−log P(o,s)]; the joint-model cost.
Entropy — H[Q]; the belief's spread.

The quiz

Q: An agent's free energy stays high after many ticks of observation. The complexity term is tiny; the accuracy term is large. What's wrong?

☐ The variational family is too narrow.

☐ The likelihood P(o|s) is mis-specified. ✓

☐ The prior P(s) is wrong.

☐ The softmax temperature is too low.

Why: Large accuracy means the agent's Q can't explain observations. If complexity is small, Q is close to P(s), so the drift from the prior isn't the issue. The likelihood model is predicting observations that don't match reality.

Run it yourself

/learn/session/2/s3_cost_of_being_wrong — session page.
/cookbook/vfe-decompose-complexity-accuracy — watch the terms split live.
/cookbook/perception-noisy-sensor-robustness — how the accuracy term reacts to a bad likelihood.
/cookbook/perception-sweep-iteration-budget — what happens if you stop F minimization early.
/equations — VFE family, with both decompositions listed.

Part 18: Session §2.4 — Action as inference. Chapter 2's fourth and final session. The move that flips free energy from "posterior machinery" to "policy machinery" — and sets up Chapter 3's Expected Free Energy. We see what happens when you let the observation be a free variable.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 16: Session 2.2 · Part 17: Session 2.3 (this post) · Part 18: Session 2.4 → coming soon

Active Inference, The Learn Arc — Part 17: Session §2.3 — The Cost of Being Wrong

The session

The decomposition

Where wrongness lives

The complexity/accuracy recipe

The mental move

Why this matters for Chapter 9

The concepts this session surfaces

The quiz

Run it yourself

Next

Tags

Author

Stats

Published

You Might Also Like

Active Inference — The Learn Arc, Part 50: Series capstone

Active Inference, The Learn Arc — Part 8: Chapter 7 — POMDPs, Sophisticated Planning, and Dirichlet Learning

Active Inference, The Learn Arc — Part 25: Session §4.3 — Expected Free Energy, Introduced Concretely

Active Inference, The Learn Arc — Part 2: Chapter 1 — Perception, Action, Learning as One Loop

Active Inference, The Learn Arc — Part 23: Session §4.1 — States, Observations, Actions — The Three Lists

Active Inference, The Learn Arc — Part 13: Session §1.2 — Perception and Action, One Loop Up Close