Active Inference, The Learn Arc — Part 4: Chapter 3 — The High Road, Where a Plan Has Two Columns of Cost

Series: The Learn Arc — 50 posts teaching Active Inference through a live BEAM-native workbench. ← Part 3: The Low Road. This is Part 4.

The hero line

The Workbench's canonical metadata renders Chapter 3 as:

Expected Free Energy: the value of a plan, as a bill with two lines.

That is the entire chapter. Two lines on one bill. Risk and ambiguity. The rest — softmax over policies, epistemic affordance, information gain vs reward — is how those two lines interact when you let time into the problem.

This is the chapter that earns the book its swagger. It's where Active Inference stops looking like "Bayesian inference with a coat on" and starts looking like a unified theory of every agent you've ever wanted to build.

Why reinforcement learning had to be rewritten

In classical RL, you pick actions that maximize expected reward. The reward is a scalar signal you hand the agent from the outside.

In Active Inference there is no reward. There's a preference distribution P(o) — a prior over the observations the agent expects to see. When what the agent sees matches what it prefers, surprise is low. When it doesn't, surprise is high. The agent picks actions that drive future observations toward that preferred distribution — which, rewritten, gives you both goal-seeking AND curiosity, for free, from one gradient.

That sounds like a marketing claim. Chapter 3 is where you see the algebra that makes it true.

The one equation

Expected Free Energy, in its decomposed form (Eq. 3.7-ish, depending on how you split it):

G(policy) =  E_Q[ log Q(state | policy) − log P(observation, state | policy) ]

          ≈  RISK                     +  AMBIGUITY
             ───────────────────────     ──────────────────────────
             KL[ Q(o|π) ‖ P(o) ]         E_Q[ H[ P(o|s) ] ]

             "your expected observations    "how uncertain are you
              under this policy don't        about what you'd see,
              match your preferences"         even after acting?"

Two terms. One says how far your plan lands from your goal (risk — the pragmatic line). One says how much your plan leaves your model uncertain (ambiguity — the epistemic line). Sum them. Minimize over policies. Softmax to get a posterior over plans.

You didn't add curiosity. Curiosity is just the ambiguity term. The agent prefers plans it thinks will disambiguate the world, because those plans reduce future free energy.

What that looks like running

Open /cookbook/efe-decompose-epistemic-pragmatic and you can watch the split live. The recipe runs an agent through a maze and logs each policy's G value broken into its two columns, step by step:

Every bar in the policy-posterior chart carries two numbers. F (variational free energy) is Chapter 2's quantity — how well this policy's belief matches the current observation. G (expected free energy) is Chapter 3's — how good this plan looks going forward. The agent softmaxes over −G to pick the next action. You see the posterior reorder in real time every time the agent takes a step.

The recipe card has a four-tier explanation matching your learning path — kid (an analogy), real (the plain-English version above), equation (the decomposition as math), derivation (the KL expansion with all terms kept).

The deeply surprising thing

Three results fall out of Chapter 3 that rewrite what "rational action" means:

1. An Active Inference agent in an uncertain world will explore before exploiting — without any ε-greedy hack. The ambiguity term dominates when uncertainty is high. The agent's softmax picks information-seeking actions. As uncertainty falls, the risk term takes over and the agent drives toward preference. The "explore/exploit dial" is a mirage; it's one functional on a changing posterior.

2. Epistemic and pragmatic value can trade off, but cannot be separately weighted. Unlike Intrinsic Motivation in RL (where you add a curiosity bonus as a scalar), the two terms in EFE are natively commensurate — they're both in nats. No hyperparameter chooses their relative weight. (The softmax temperature sets the overall policy precision, but doesn't tilt risk vs ambiguity.)

3. The "goals" are priors. You don't give an Active Inference agent a reward function. You give it a P(o) — a distribution over observations it prefers to see. Chapter 3 makes this formally equivalent to goal-directed behavior, and strictly more expressive than scalar rewards because you can express "I'd prefer to reach X, and also avoid Y, and also stay near Z." Multi-objective agency, no Pareto frontier wrangling.

The epistemic exploration recipe

One of my favourite cookbook recipes maps this directly:

/cookbook/epistemic-info-gain-vs-reward runs two agents side by side — one with a strong preference (low temperature on P(o), so the pragmatic line dominates) and one with a diffuse preference (high temperature, so the epistemic line dominates). You watch them move differently in the same maze. The diffuse one wanders toward unexplored cells; the strong-preference one beelines for the goal.

Same agent architecture. Same equation. Same code. Different priors. That's the lesson.

The session under this chapter

Chapter 3 has three sessions in the Workbench curriculum:

Expected free energy in one page — the derivation above, expanded.
Epistemic vs pragmatic value — the decomposition, step by step, with an agent running on a forked maze.
The softmax policy — why the precision parameter on that softmax is itself a meaningful quantity the brain seems to implement (foreshadowing Chapter 5's neuromodulators).

Open /learn/chapter/3 to read the sessions. Each has a path-specific narration, an attributed excerpt from the book, the EFE equation in interactive form, and a linked lab.

Why this matters for what's coming

Chapters 1–3 are the theory spine. From here forward the book does something different: it builds. Chapter 4 turns the abstract generative model into A, B, C, D matrices. Chapter 5 shows the cortex as a factor graph. Chapter 6 gives you a step-by-step recipe for designing your own agent. Chapter 7 is the POMDP workhorse. The theory in Chapters 1–3 is the skeleton; Chapters 4–7 are the muscle.

And every one of those downstream chapters has one or more runnable recipes and one or more live Workbench surfaces the series will walk through.

Run it yourself

/cookbook/efe-decompose-epistemic-pragmatic — watch the bill split live.
/cookbook/epistemic-info-gain-vs-reward — watch two priors produce two behaviors.
/cookbook/epistemic-curiosity-driver — drop the preference entirely; watch pure exploration.
/cookbook/sophisticated-plan-tree-search — when the horizon grows, EFE search becomes tree search. Chapter 7's sophisticated planner in miniature.
/labs?recipe=efe-decompose-epistemic-pragmatic — one-click run, see the episode loop.

The mental move

Before Part 5, write down what you think P(o) looks like for:

A thermostat (1 dimension, 1 mode at 72°F).
A honeybee returning to the hive (multi-modal, skewed to the hive's colour).
A scientist picking the next experiment (wide, multi-modal, includes hypotheses they don't want confirmed but want to test).

Each one is a plan evaluator under the same single equation. The agent is not different — only P(o) is different. That's the chapter's gift.

Part 5: Chapter 4 — The Generative Models of Active Inference. A, B, C, D matrices. Eq. 4.13 belief update. Eq. 4.14 policy posterior. Eq. 4.19 quadratic free energy in generalised coordinates. This is where we build the Lego canvas — pick hidden states, pick observations, pick actions, wire them into a Jido agent. Runnable as /cookbook/pomdp-tiny-corridor, composable as /builder/new?recipe=pomdp-tiny-corridor.

⭐ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench · MIT license

📖 Active Inference, Parr, Pezzulo, Friston — MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference

← Part 3: The Low Road · Part 4: The High Road (this post) · Part 5: Generative Models →

Active Inference, The Learn Arc — Part 4: Chapter 3 — The High Road, Where a Plan Has Two Columns of Cost

The hero line

Why reinforcement learning had to be rewritten

The one equation

What that looks like running

The deeply surprising thing

The epistemic exploration recipe

The session under this chapter

Why this matters for what's coming

Run it yourself

The mental move

Next

Tags

Author

Stats

Published

You Might Also Like

Active Inference — The Learn Arc, Part 50: Series capstone

Active Inference, The Learn Arc — Part 8: Chapter 7 — POMDPs, Sophisticated Planning, and Dirichlet Learning

Active Inference, The Learn Arc — Part 25: Session §4.3 — Expected Free Energy, Introduced Concretely

Active Inference, The Learn Arc — Part 2: Chapter 1 — Perception, Action, Learning as One Loop

Active Inference, The Learn Arc — Part 23: Session §4.1 — States, Observations, Actions — The Three Lists

Active Inference, The Learn Arc — Part 13: Session §1.2 — Perception and Action, One Loop Up Close