Series: The Learn Arc โ 50 posts teaching Active Inference through a live BEAM-native workbench. โ Part 25: Session 4.3. This is Part 26.
The session
Chapter 4, ยง4. Session title: MDP world. Route: /learn/session/4/s4_mdp_world.
Sessions 4.1โ4.3 introduced the three lists, the A matrix, and the concrete EFE computation. Session 4.4 wires them all together for the first time: a fully-specified MDP world where you can watch every matrix do its job against a Jido agent running the full Perceive โ Plan โ Act loop.
The world
The canonical teaching MDP in the Workbench is tiny_open_goal โ a 3ร3 maze with:
-
States: 9 grid cells
{(0,0), (0,1), ..., (2,2)}. - Observations: a 6-channel Markov blanket (wall-north/south/east/west + at-goal + position-hint) with moderate noise on A.
-
Actions:
{north, south, east, west}โ four cardinal moves. - Preferences: concentrated mass on the goal cell.
- Dynamics: B is mostly deterministic (walls block, open cells move), with a small chance of slipping.
All four matrices. One episode. Run button. The next 60 seconds of watching that run is where every piece you've been learning lands.
What you actually see
The LiveView surface on /world with Tiny Open Goal:
-
Maze grid โ 3ร3 cells, the agent's position marked
@, the goal markedโ. -
Belief heatmap โ 3ร3 cells shaded by
Q(s)marginal. Bright = likely current state. - Policy posterior table โ top-5 policies with F, G, risk, ambiguity columns.
- Step history โ a strip of action labels scrolling right.
- Controls โ Step, Run, Pause, Reset.
Five panels. Four matrices. One agent. Press Step ten times and you've seen the whole chapter fold into motion.
The agent's internal state
Every time you press Step, the agent's state vector updates in place. It holds:
%{
agent_id: "agent-tiny-...",
Q_s: [[0.02, 0.03, ...], ...], # belief over 9 states, per tick
policy_posterior: [...], # Q(ฯ) = softmax(-ฮณG - F)
best_policy: [:east, :east, :north, :north], # argmax first-step
best_f: -0.41, best_g: 0.18, # policy-posterior stats
last_action: :east,
bundle: %{A: ..., B: ..., C: ..., D: ...}, # the four matrices
...
}
The bundle field is where A, B, C, D live. They're not recomputed each tick (unless the agent is Dirichlet-learning them) โ they're the fixed structure the agent uses to perceive and plan.
Why MDPs get a dedicated session
MDPs are a degenerate case of POMDPs where observation = state (A is identity). Chapter 4 covers them anyway because:
1. MDPs are where a lot of RL intuition lives. If you've done gridworld RL, you've lived in MDP world. Session 4.4 bridges the vocabularies: in Active Inference, an MDP is just a POMDP with identity A, and every RL technique you know maps onto a special case of the framework.
2. Ambiguity drops out. When A is identity, H[A_col(s)] = 0 for every s. The ambiguity term in G vanishes; only risk remains. The agent becomes purely pragmatic. This is useful pedagogically โ you can isolate the risk term and study its behavior without ambiguity muddying the picture.
3. Debugging is easier. When the agent's belief doesn't match reality in a POMDP, you don't know if it's A's fault, B's fault, or a sensor glitch. In an MDP, observation = state, so if belief is wrong, it's B or the prior.
The worked example
The session's worked example opens tiny_open_goal and walks tick-by-tick:
-
Tick 0: agent starts at
(0,0)with uniform prior; belief heatmap is flat. -
Tick 1: observes walls-north + walls-west; belief sharpens toward
(0,0). -
Tick 2: picks east (
Gminimized by approach-plus-discover); arrives at(1,0); belief jumps. -
Tick 3: observes new wall pattern; belief at
(1,0). - ...continues through to the goal.
Every tick the book predicts what the agent should do next. Every tick the Workbench shows you what it actually did. The match is the proof Chapters 1โ4 earned.
The Glass audit
Open /glass/agent/<id> while /world is running. You'll see one signal per equation-fire per tick:
12:34:56.789 perceive eq_4_13_state_belief_update Q_s updated
12:34:56.792 plan_G eq_4_14_policy_posterior G computed per policy
12:34:56.793 plan_F eq_2_6_vfe F computed per policy
12:34:56.794 softmax eq_4_14_policy_posterior policy_posterior produced
12:34:56.795 act eq_4_14_policy_posterior action emitted
That's Chapter 4 running. Click any equation ID โ land on the equation page โ read the math โ come back to the signal and inspect its payload. The whole chapter in one audit trail.
The concepts this session surfaces
- MDP โ Markov Decision Process; A is identity.
- POMDP degenerate case โ MDP is a POMDP with deterministic sensor.
- Bundle โ the A/B/C/D struct carried by every agent.
- Agent state โ the GenServer's live reconstruction of Q and its cache.
The quiz
Q: In the tiny-open-goal MDP, why does the ambiguity term in G stay near zero?
- โ The world is too small for ambiguity to matter.
- โ The A matrix is near-identity, so
H[A_col(s)] โ 0for every s. โ- โ The softmax temperature is too high.
- โ The Dirichlet prior is too strong.
Why: In an MDP the observation deterministically identifies the state, so the per-column entropy of A is near zero. With Hโ0, the ambiguity term in G drops out; only risk drives policy selection. That's exactly what MDPs are: fully-observed states.
Run it yourself
-
/learn/session/4/s4_mdp_worldโ session page. -
/worldโ Tiny Open Goal, press Step 10 times. -
/cookbook/pomdp-tiny-corridorโ a proper POMDP for contrast (non-identity A). -
/glassโ per-tick signal audit.
The mental move
After Session 4.4, Active Inference stops being a theoretical commitment and starts being an engineering style. You pick three lists, fill four matrices, compute two equations per tick, and softmax the result. The world shows you whether your choices were right.
Next
Part 27: Session ยง4.5 โ Practice. Chapter 4's closing session is a full-session workshop: build your own generative model from scratch, save it as a spec, instantiate a Studio-tracked agent, attach it to a world, run and inspect. The whole chapter as a single extended exercise.
โญ Repo: github.com/TMDLRG/TheORCHESTRATEActiveInferenceWorkbench ยท MIT license
๐ Active Inference, Parr, Pezzulo, Friston โ MIT Press 2022, CC BY-NC-ND: mitpress.mit.edu/9780262045353/active-inference
โ Part 25: Session 4.3 ยท Part 26: Session 4.4 (this post) ยท Part 27: Session 4.5 โ coming soon








