This is Part 4 of the Object Storage Internals series.
Previous articles covered core mental models, metadata bottlenecks, and consistency tradeoffs.
When people think about object storage performance, they usually focus on reads and writes.
That makes sense.
Uploading an object sounds expensive.
Downloading an object sounds expensive.
Listing objects sounds trivial.
After all, how hard can this be?
GET /photos/
Return all the objects and move on.
The first time I started looking at object storage internals, I assumed listing was one of the easier operations.
I was wrong.
In many large-scale storage systems, listing objects is significantly more complicated than reading a single object.
Reading one object is usually straightforward
Suppose a client requests:
GET /photos/cat.png
The storage system only needs to answer a few questions:
- Does the object exist?
- Where is it stored?
- Which nodes hold the data?
Once metadata provides the answer, the object can be retrieved.
The operation is targeted.
The system knows exactly what it is looking for.
Listing is a completely different problem
Now consider:
LIST /photos/
The request no longer asks for one object.
It asks for every object matching a prefix.
In a small system this is easy.
In a large distributed storage system, it becomes surprisingly expensive.
Imagine:
- 100 billion objects
- Hundreds of storage nodes
- Metadata distributed across partitions
The answer to a listing request may be spread across dozens of machines.
No single node necessarily knows the complete answer.
The system has to assemble reality
A read operation usually follows a path:
Object Name
β
Metadata Lookup
β
Storage Node
A listing operation often looks more like:
Client
β
Metadata Partition 1
Metadata Partition 2
Metadata Partition 3
Metadata Partition N
β
Merge Results
Sort Results
Remove Duplicates
Return Response
The system is effectively reconstructing a view of reality from multiple sources.
That takes work.
Consistency makes it harder
Things become more interesting when objects are changing while a listing operation is running.
Imagine:
Client A uploads object X
Client B deletes object Y
Client C performs LIST
What should Client C see?
The answer depends on the consistency guarantees of the system.
Some storage systems prioritize a consistent view.
Others prioritize performance and availability.
Either way, the metadata layer now has to make decisions.
This is one reason why listing is often a metadata problem rather than a storage problem.
The first surprising lesson
Many engineers assume object storage is primarily about moving data.
In practice, large storage systems spend an enormous amount of effort managing information about data.
The actual object might be sitting safely on disk.
The hard part is determining whether that object should appear in a query result right now.
Why S3 listing behavior confused developers for years
Historically, developers occasionally encountered situations where:
- Upload succeeds
- Immediate LIST request occurs
- Object does not appear
The object existed.
The storage system had accepted the write.
The issue was that metadata updates had not fully converged.
From the developer's perspective, it felt like a bug.
From the storage system's perspective, it was a consequence of the consistency model.
This is one of the reasons object listing became such an important topic in storage architecture.
Scale changes everything
Imagine a bucket containing:
10,000 objects
Listing is easy.
Now imagine:
10 billion objects
The problem changes completely.
Questions suddenly appear:
- How should metadata be partitioned?
- How are results sorted?
- How is pagination handled?
- How much memory should listing consume?
- How many metadata servers participate?
The operation that looked simple now touches some of the most important architectural decisions in the entire system.
Why storage engineers care so much about metadata
After spending time studying object storage systems, one pattern keeps appearing:
Whenever something becomes difficult, metadata is usually involved.
Part 2 of this series argued that metadata is the real system.
Listing operations are a good example.
The object data itself is often not the challenge.
The challenge is maintaining an accurate and scalable view of billions of objects while the system is constantly changing.
The trade-off nobody sees
Users see:
LIST /photos/
Storage engineers see:
Distributed metadata
Consistency guarantees
Partitioning strategy
Pagination
Failure handling
Concurrency
Scalability
The API looks simple because the storage system absorbs the complexity.
That simplicity is expensive.
Key takeaway
Reading a single object is usually about finding data.
Listing objects is about understanding the state of the entire system.
That's why listing often becomes one of the most metadata-intensive operations in cloud storage.
The bigger the system becomes, the more difficult that problem gets.
Next in this series
Part 5: Why Object Storage Systems Avoid In-Place Updates
If updating a file seems simple on your laptop, why do many object storage systems prefer creating new versions instead of modifying existing data?







