Series: The Learn Arc โ 50 posts teaching Active Inference through a live BEAM-native workbench. โ Part 3: The Low Road. This is Part 4.
The hero line
The Workbench's canonical metadata renders Chapter 3 as:
Expected Free Energy: the value of a plan, as a bill with two lines.
That is the entire chapter. Two lines on one bill. Risk and ambiguity. The rest โ softmax over policies, epistemic affordance, information gain vs reward โ is how those two lines interact when you let time into the problem.
This is the chapter that earns the book its swagger. It's where Active Inference stops looking like "Bayesian inference with a coat on" and starts looking like a unified theory of every agent you've ever wanted to build.
Why reinforcement learning had to be rewritten
In classical RL, you pick actions that maximize expected reward. The reward is a scalar signal you hand the agent from the outside.
In Active Inference there is no reward. There's a preference distribution P(o) โ a prior over the observations the agent expects to see. When what the agent sees matches what it prefers, surprise is low. When it doesn't, surprise is high. The agent picks actions that drive future observations toward that preferred distribution โ which, rewritten, gives you both goal-seeking AND curiosity, for free, from one gradient.
That sounds like a marketing claim. Chapter 3 is where you see the algebra that makes it true.
The one equation
Expected Free Energy, in its decomposed form (Eq. 3.7-ish, depending on how you split it):
G(policy) = E_Q[ log Q(state | policy) โ log P(observation, state | policy) ]
โ RISK + AMBIGUITY
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ
KL[ Q(o|ฯ) โ P(o) ] E_Q[ H[ P(o|s) ] ]
"your expected observations "how uncertain are you
under this policy don't about what you'd see,
match your preferences" even after acting?"
Two terms. One says how far your plan lands from your goal (risk โ the pragmatic line). One says how much your plan leaves your model uncertain (ambiguity โ the epistemic line). Sum them. Minimize over policies. Softmax to get a posterior over plans.
You didn't add curiosity. Curiosity is just the ambiguity term. The agent prefers plans it thinks will disambiguate the world, because those plans reduce future free energy.
What that looks like running
Open /cookbook/efe-decompose-epistemic-pragmatic and you can watch the split live. The recipe runs an agent through a maze and logs each policy's G value broken into its two columns, step by step:
Every bar in the policy-posterior chart carries two numbers. F (variational free energy) is Chapter 2's quantity โ how well this policy's belief matches the current observation. G (expected free energy) is Chapter 3's โ how good this plan looks going forward. The agent softmaxes over โG to pick the next action. You see the posterior reorder in real time every time the agent takes a step.
The recipe card has a four-tier explanation matching your learning path โ kid (an analogy), real (the plain-English version above), equation (the decomposition as math), derivation (the KL expansion with all terms kept).
The deeply surprising thing
Three results fall out of Chapter 3 that rewrite what "rational action" means:
1. An Active Inference agent in an uncertain world will explore before exploiting โ without any ฮต-greedy hack. The ambiguity term dominates when uncertainty is high. The agent's softmax picks information-seeking actions. As uncertainty falls, the risk term takes over and the agent drives toward preference. The "explore/exploit dial" is a mirage; it's one functional on a changing posterior.
2. Epistemic and pragmatic value can trade off, but cannot be separately weighted. Unlike Intrinsic Motivation in RL (where you add a curiosity bonus as a scalar), the two terms in EFE are natively commensurate โ they're both in nats. No hyperparameter chooses their relative weight. (The softmax temperature sets the overall policy precision, but doesn't tilt risk vs ambiguity.)
3. The "goals" are priors. You don't give an Active Inference agent a reward function. You give it a P(o) โ a distribution over observations it prefers to see. Chapter 3 makes this formally equivalent to goal-directed behavior, and strictly more expressive than scalar rewards because you can express "I'd prefer to reach X, and also avoid Y, and also stay near Z." Multi-objective agency, no Pareto frontier wrangling.
The epistemic exploration recipe
One of my favourite cookbook recipes maps this directly:
/cookbook/epistemic-info-gain-vs-reward runs two agents side by side โ one with a strong preference (low temperature on P(o), so the pragmatic line dominates) and one with a diffuse preference (high temperature, so the epistemic line dominates). You watch them move differently in the same maze. The diffuse one wanders toward unexplored cells; the strong-preference one beelines for the goal.
Same agent architecture. Same equation. Same code. Different priors. That's the lesson.
The session under this chapter
Chapter 3 has three sessions in the Workbench curriculum:
- Expected free energy in one page โ the derivation above, expanded.
- Epistemic vs pragmatic value โ the decomposition, step by step, with an agent running on a forked maze.
- The softmax policy โ why the precision parameter on that softmax is itself a meaningful quantity the brain seems to implement (foreshadowing Chapter 5's neuromodulators).
Open /learn/chapter/3 to read the sessions. Each has a path-specific narration, an attributed excerpt from the book, the EFE equation in interactive form, and a linked lab.
Why this matters for what's coming
Chapters 1โ3 are the theory spine. From here forward the book does something different: it builds. Chapter 4 turns the abstract generative model into A, B, C, D matrices. Chapter 5 shows the cortex as a factor graph. Chapter 6 gives you a step-by-step recipe for designing your own agent. Chapter 7 is the POMDP workhorse. The theory in Chapters 1โ3 is the skeleton; Chapters 4โ7 are the muscle.
And every one of those downstream chapters has one or more runnable recipes and one or more live Workbench surfaces the series will walk through.
Run it yourself
-
/cookbook/efe-decompose-epistemic-pragmaticโ watch the bill split live. -
/cookbook/epistemic-info-gain-vs-rewardโ watch two priors produce two behaviors. -
/cookbook/epistemic-curiosity-driverโ drop the preference entirely; watch pure exploration. -
/cookbook/sophisticated-plan-tree-searchโ when the horizon grows, EFE search becomes tree search. Chapter 7's sophisticated planner in miniature. -
/labs?recipe=efe-decompose-epistemic-pragmaticโ one-click run, see the episode loop.
The mental move
Before Part 5, write down what you think P(o) looks like for:
- A thermostat (1 dimension, 1 mode at 72ยฐF).
- A honeybee returning to the hive (multi-modal, skewed to the hive's colour).
- A scientist picking the next experiment (wide, multi-modal, includes hypotheses they don't want confirmed but want to test).
Each one is a plan evaluator under the same single equation. The agent is not different โ only P(o) is different. That's the chapter's gift.
Next
Part 5: Chapter 4 โ The Generative Models of Active Inference. A, B, C, D matrices. Eq. 4.13 belief update. Eq. 4.14 policy posterior. Eq. 4.19 quadratic free energy in generalised coordinates. This is where we build the Lego canvas โ pick hidden states, pick observations, pick actions, wire them into a Jido agent. Runnable as /cookbook/pomdp-tiny-corridor, composable as /builder/new?recipe=pomdp-tiny-corridor.
โญ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench ยท MIT license
๐ Active Inference, Parr, Pezzulo, Friston โ MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference
โ Part 3: The Low Road ยท Part 4: The High Road (this post) ยท Part 5: Generative Models โ










