Reinforcement Learning Cheat Sheet (Exam Killer Version)
*1. Core Idea (Write This in Any Answer Intro)
*
Reinforcement Learning is a learning paradigm where an agent interacts with an environment and learns to take actions that maximize cumulative reward over time.
Keywords to include:
Trial and error
Reward signal
Sequential decision making
2. RL Framework (Must Draw in Exam)
Agent β Action β Environment β Reward β New State
Write:
Agent (decision maker)
Environment (external system)
State (current situation)
Action (choice)
Reward (feedback)
π Example (very important for marks):
Game playing / robot navigation
** 3. Markov Decision Process (MDP)**
Definition:
MDP is a mathematical model for RL problems.
Tuple:
(S, A, P, R, Ξ³)
S β States
A β Actions
P β Transition probability
R β Reward
Ξ³ β Discount factor
π Key concept:
Markov Property β Future depends only on present state
4. Return & Discount Factor
Ξ³ (0 to 1)
High Ξ³ β future matters
Low Ξ³ β immediate reward matters
5. Value Functions (Very Important)
State Value: V(s) β how good a state is
Action Value: Q(s,a) β how good an action is
π Always mention:
βExpected cumulative rewardβ
6. Bellman Equation (CORE CONCEPT)
π Key idea:
Breaks problem into smaller subproblems
Recursive nature
7. Policy
Policy = strategy of agent
Deterministic β fixed action
Stochastic β probability-based
π Write:
Ο(a|s)
8. Q-Learning (Most Important Algorithm)
Off-policy
Uses max future reward
9. SARSA
On-policy
Uses actual next action
10. Q-Learning vs SARSA (Exam Favorite)
11. Exploration vs Exploitation
Exploration β try new actions
Exploitation β use best known
π Method:
Epsilon-greedy
12. Monte Carlo vs TD Learning
13. Policy Iteration vs Value Iteration
Policy Iteration:
Evaluate β Improve
Value Iteration:
Directly update values
14. Common Exam Mistakes (Avoid These)
Writing definitions without examples
Skipping diagrams
Not explaining formulas
No comparison tables
15. 1-Minute Revision Strategy
Before exam Revise:
Bellman Equation
Q-Learning & SARSA
MDP
π These alone can cover most paper.
THIS IS THE PART1 IF YOU WANT PART2 OF CHEATSHEET JUST COMMENT BELOW OR VISIT, END OF THE SESSION









