The hook: I added includes and the page was still N+1
We had an admin page listing couriers, each row showing how many shipments that courier had. As data grew, the page went from 1 second to 7–8 seconds.
I opened the APM (New Relic): SQL alone took 6+ seconds, and the query count had exploded — one page listing 200 couriers fired 200+ queries. Classic N+1. I added includes(:shipments), the query count dropped to 2, the page came back under 1 second, and I installed the Bullet gem in dev to catch N+1 automatically. Problem solved, I thought.
Until another page — which already had includes, and which Bullet never warned about — was still slow, still N+1. Debugging that one taught me the real lesson: whether includes works depends on how you use the data it preloads, and you can only see that if you understand what ActiveRecord does underneath.
So this post doesn't start with "add includes." It starts one level deeper.
Same example throughout:
Courier has_many :shipments. Every step shows the actual SQL that gets sent.
Layer 0: every query passes through four stages
Every line of ActiveRecord you write runs through this pipeline:
① ActiveRecord syntax (Ruby)
Courier.where(...).includes(...)
just "accumulating conditions", lazy — hasn't touched the DB yet
↓ Rails translates
② SQL (a string)
SELECT ... FROM couriers WHERE ...
your conditions turned into something the DB understands
↓ sent
③ Database executes
where the real work happens: JOIN, WHERE filter, ORDER, DISTINCT
all of it here → returns a flat table of rows
↓ returned
④ Memory (Ruby objects)
Rails turns each row back into an ActiveRecord object for you
Remember two sentences — everything below is a corollary of them:
- Filtering, JOINs, sorting, deduping all happen in ③ the database. And the DB can only filter on columns that exist in that SQL statement — a column that isn't in the query can't be filtered on.
-
Objects,
.size, association arrays all happen in ④ memory. The data is already back; nothing here touches the DB.
ActiveRecord's selling point is hiding ②③ so it feels like there's only ① and ④. Great most of the time — but every performance problem lives in the ②③ you stopped looking at.
Layer 1: what N+1 actually is
Let's grab three rows with limit(3). The naive version (no preloading):
couriers = Courier.limit(3)
couriers.each do |c|
c.shipments.size # every iteration fires another query against ③
end
The actual SQL sent:
SELECT * FROM couriers LIMIT 3; -- the "1": fetch 3 couriers
SELECT * FROM shipments WHERE courier_id = 1; -- courier #1
SELECT * FROM shipments WHERE courier_id = 2; -- courier #2
SELECT * FROM shipments WHERE courier_id = 3; -- courier #3
Fetching 3 couriers is 1 query; each iteration adds 1 more for its shipments: 1 + 3 = 4. With 200 couriers, that's 201.
Why? Because c.shipments is lazy by default — touch it and it fires a fresh query against ③. Put that inside a loop and you get N queries.
Layer 2: three ways to preload, collapsing those N queries
The only difference is "how many SQL statements ② produces, and whether ③ does a JOIN." Same three rows:
preload — fetch separately (no JOIN)
Courier.preload(:shipments).limit(3)
SELECT * FROM couriers LIMIT 3; -- parent table
SELECT * FROM shipments WHERE courier_id IN (1, 2, 3); -- the N queries collapsed into one IN
Afterwards, Rails matches the shipments back to their courier in ④ memory using courier_id. Always 2 queries.
eager_load — one query with a LEFT JOIN
Courier.eager_load(:shipments).limit(3)
SELECT couriers.id AS t0_r0, couriers.name AS t0_r1,
shipments.id AS t1_r0, shipments.tracking_no AS t1_r1
FROM couriers
LEFT JOIN shipments ON shipments.courier_id = couriers.id;
One SQL statement, joining into a wide table. The columns get renamed to t0_* / t1_* because once the tables are joined, id and name collide — Rails numbers each column so it can split the row back into separate objects.
includes — let Rails choose
# A: no filtering on the association → runs as preload (the 2 queries above)
Courier.includes(:shipments).limit(3)
# B: filtering on a shipments column → auto-switches to eager_load (the JOIN above)
Courier.includes(:shipments).where(shipments: { status: "delivered" })
.references(:shipments)
includes isn't a third fetching strategy — it auto-switches: no association filtering → preload; filtering on an association column (add references) → eager_load (JOIN). Use it by default and let Rails decide.
So now the N queries are collapsed to a fixed count. But real pages aren't that simple — which is the payoff for that "includes still N+1" teaser. Here's the core.
Part 2 — After preloading: are you filtering the child or the parent?
Real pages rarely "fetch everything and show it" — they have conditions. So first answer one question, because it decides which solution you use:
- Filtering the child — every courier should only display a subset of its shipments (e.g. only delivered ones).
- Filtering the parent — you only want to keep couriers matching a condition (e.g. only couriers that have a delivered shipment).
These are two different needs with two different solutions. Conflating them is exactly how you write the "added includes but still N+1" code. Let's split them.
Need A: filter the child — each courier shows only its delivered shipments
Keep every courier, but each one only displays its "delivered" shipments.
❌ The naive way: .where on the association in the view (this is the bug from the hook)
# controller: preloads ALL shipments
couriers = Courier.includes(:shipments).limit(3)
<%# view: but each courier only wants the delivered ones %>
<% couriers.each do |c| %>
<%= c.shipments.where(status: "delivered").size %> <%# ← blows up here %>
<% end %>
The actual SQL:
-- includes preloads ALL shipments first (these 2 are fine on their own)
SELECT * FROM couriers LIMIT 3;
SELECT * FROM shipments WHERE courier_id IN (1, 2, 3);
-- but .where(status:) in the view is a NEW conditional query; Rails ignores the in-memory set
-- and re-queries once per courier:
SELECT * FROM shipments WHERE courier_id = 1 AND status = 'delivered';
SELECT * FROM shipments WHERE courier_id = 2 AND status = 'delivered';
SELECT * FROM shipments WHERE courier_id = 3 AND status = 'delivered';
Double waste: you spent 2 queries loading all shipments into ④ memory (never used), then .where re-queries once per courier → N+1 is fully back. And Bullet stays silent, because :shipments was preloaded — it only checks "did you preload," not "did you actually use what you preloaded."
Root cause (back to Layer 0): it's the nature of .where.
In ActiveRecord, .where is always defined as "build a new SQL statement and send it to ③ the database." It is not "filter the array in ④ memory." Two different layers:
- What you assumed: this courier's shipments are already in memory, so
.wherejust picks the delivered ones from that array. - What actually happens: the moment
.whereis called, Rails builds aWHERE ... AND status = 'delivered'and re-queries ③, completely ignoring the in-memory set.
Why doesn't it filter in memory? Timing. The preload order is: ② builds and sends the "all shipments" SQL → ③ runs it → ④ loads it into memory. By the time you write .where in the view, that preload SQL was built, sent, and returned long ago — the condition arrives too late to be slipped into a statement that already ran. Rails can't go back and rewrite that SQL, so it builds a fresh one against the DB.
To filter the in-memory set without hitting the DB, use Ruby's
select(not SQL's):c.shipments.select { |s| s.status == "delivered" }— that genuinely filters the ④ array. But it still has to load everything first, which is rough on memory at scale. The clean fix is the scoped association below, making the condition take effect back at ②.
✅ The fix: a scoped association (weld the condition into the association definition)
Make the condition take effect at ② when the SQL is built, not patched on at ④. Define an association that carries the condition:
class Courier < ApplicationRecord
has_many :shipments
# scoped association: this association only ever contains delivered ones
has_many :delivered_shipments,
-> { where(status: "delivered") },
class_name: "Shipment"
end
# controller
couriers = Courier.includes(:delivered_shipments).limit(3)
<%# view: use it directly, no more .where %>
<% couriers.each do |c| %>
<%= c.delivered_shipments.size %>
<% end %>
The actual SQL — back to a fixed 2 queries, with the condition merged into the preload's IN:
SELECT * FROM couriers LIMIT 3;
SELECT * FROM shipments WHERE courier_id IN (1, 2, 3) AND status = 'delivered';
The key difference: the naive condition shows up at ④ memory (too late, doesn't match the preload); the scoped association makes the condition part of the association definition, so Rails sees it at ② when building the preload SQL and welds it straight into the same IN — ③ fetches exactly what's needed in one shot.
Analogy: sending someone to the warehouse. The naive way is "bring back everything," and only when it's in front of you do you say "I just wanted the delivered ones" → a wasted trip, and a second one to filter. The scoped association writes "only the delivered ones" on the list before they leave → one correct trip.
(Detail: an association scope must be wrapped in a lambda
-> { ... }— that's required Rails syntax. The lambda also means "run the condition on every query," which matters for time-based values like7.days.ago: with the lambda it's recomputed each query as "7 days before today"; without it, the value is computed once when the class loads and frozen at the server's boot date, so "last 7 days" silently drifts wrong as time passes. Ourstatus: "delivered"is a constant so it makes no difference here — but scopes are always written as lambdas regardless.)
Need B (advanced): filter the parent — keep only couriers that have a delivered shipment
List only couriers with at least one delivered shipment — drop the rest entirely. Note how this differs from Need A: A keeps all couriers and filters their shipments; B filters the couriers themselves.
⚠️ The naive way: eager_load + where (works, but "row explosion" on has_many)
To filter the parent by a child column (shipments.status), you must JOIN — preload can't do it (Layer 0: filtering happens in ③, so the column has to be in that SQL). The obvious tool is eager_load:
Courier.eager_load(:shipments).where(shipments: { status: "delivered" })
SELECT couriers.*, shipments.*
FROM couriers
LEFT JOIN shipments ON shipments.courier_id = couriers.id
WHERE shipments.status = 'delivered';
Correct result, but a performance trap: eager_load also pulls the shipment columns back to build objects, and on a has_many the parent gets duplicated. If a courier has 50 delivered shipments, that courier's columns appear in 50 rows:
courier_id | name | ... | shipment_id
1 | BlackCat | ... | 10
1 | BlackCat | ... | 11 ← BlackCat's whole row copied again
1 | BlackCat | ... | 12 ← and again (×50)
200 couriers × 50 each → 10,000 rows, parent columns copied over and over. Transfer size and memory both balloon, and Rails still has to dedupe it back into objects. At scale, it drags.
✅ The fix: joins to filter + distinct to dedupe + preload to load
The principle: "filtering" and "displaying" are two separate jobs — split them. Filter with joins (doesn't pull child columns, avoids the explosion), display with preload (a separate IN).
Two tools first:
joins: does an INNER JOIN but only SELECTs the parent's columns. The child is just a filter, never loaded into objects.
(Note: "doesn't pull the child back" isn't a property of JOIN — it's a consequence of joins only selecting the parent. JOIN connects tables; SELECT decides which columns come back. joins writes SELECT couriers.*; eager_load writes SELECT couriers.*, shipments.*. That's the whole difference.)
distinct: folds identical rows into one. Because joins on a has_many duplicates the parent, but the child columns aren't selected, those rows are "identical in every column" — so distinct collapses them into one.
Put together:
couriers = Courier.joins(:shipments) # INNER JOIN, only to filter
.where(shipments: { status: "delivered" }) # filter couriers by a child column
.distinct # drop duplicate couriers from the JOIN
.preload(:shipments) # separately load shipments to display
The actual SQL — two queries, each with one job:
-- ① joins + distinct: only to find couriers WITH a delivered shipment; child columns not selected → no explosion
SELECT DISTINCT couriers.* FROM couriers
INNER JOIN shipments ON shipments.courier_id = couriers.id
WHERE shipments.status = 'delivered';
-- say this returns couriers 1 and 2
-- ② preload: a separate IN to load the shipments to display, into ④ memory
SELECT * FROM shipments WHERE courier_id IN (1, 2);
The view uses c.shipments (already preloaded):
<% couriers.each do |c| %>
<%= c.name %>: <%= c.shipments.size %>
<% end %>
Filtering goes to the JOIN, loading goes to preload — so you get both "filter the parent by a child column" and "no row explosion."
Note that
preload(:shipments)here loads all shipments (including non-delivered ones). If you want "keep only couriers with a delivered shipment and display only the delivered ones," combine A and B:joins(:delivered_shipments).distinct.preload(:delivered_shipments). Scoped associations and joins compose.
Wrap-up: every rule is a corollary of the same foundation
Looking back, this post started from one "added includes but still N+1" incident and ended up at a single thing: for the line of Ruby you wrote, what happens in which layer.
Once the four layers click, all the scattered "rules" connect:
- preload can't filter by an association column — because filtering is in ③, and the child table isn't in the main SQL.
-
c.shipments.where(...)breaks the preload — because.wherere-queries ③, ignoring the ④ in-memory set. -
joinsdoesn't load the association,eager_loaddoes — the difference is which columns SELECT pulls. -
.sizedoesn't hit the DB,.countdoes —.sizecounts the ④ array,.countgoes back to ③ with a SQL statement.
And "Need A vs Need B" is just two sides of the same question: are you filtering the child or the parent? Answer that and the solution falls out — filter the child with a scoped association, filter the parent with joins (and preload if row explosion is a concern).
Appendix: the full N+1 map
"Filter child/parent" is one piece — don't mistake it for the whole. The full breakdown:
First cut: is it preloaded at all?
No → add includes (fixes ~80%, the basics)
Yes but still slow ↓
Second cut: HOW are you using the preloaded data?
├ Filtering with a condition → filter child (scoped association) / filter parent (joins + preload) ← this post
├ .count in a loop → use .size (counts memory) or counter_cache (skips preload entirely)
└ Nested associations: c.shipments.each { s.packages } → includes(shipments: :packages)
The first cut is the basics — an includes away. The one that trips people up is the second cut: "I did preload, why is it still slow?" That's what this post is about.
Optimization isn't memorizing a checklist — it's being able to ask the right question about a slow query: is this step in the DB or in memory? Is the column in that SQL statement? Am I filtering the child or the parent? Answer those and the fix usually follows. This lens carries over to any language's ORM (Django ORM, Eloquent, Hibernate), because they all do the same thing — turn objects into SQL, across databases.











