Rate limiting sounds simple at first, until you try to make it work across multiple server instances without breaking consistency.
I wanted to build something beyond the usual “store a counter in memory and block after 5 requests” demo. So I built APIShield, a distributed rate limiting system that works across multiple backend instances while enforcing limits consistently using Redis, Lua scripting, and MongoDB-backed dynamic rules.
This project ended up being a lot more interesting than I expected. It pushed me into questions around atomic operations, shared state, rule prioritization, and how to make a system feel closer to a production service than a middleware experiment.
In this post, I’ll walk through the architecture, the rate limiting strategies I implemented, and why I ended up moving the sliding window logic into Redis using Lua.
GitHub Repository: APIShield
What is APIShield?
APIShield is a distributed rate limiting system that supports:
- Fixed Window
- Sliding Window
- Token Bucket
- Dynamic rule management
- Violation tracking
- Admin dashboard for rule configuration
The main goal was to build a rate limiter that could run across multiple backend instances while still applying limits consistently, even when requests hit different servers.
Architecture
The system looks like this:
Client
│
▼
Backend Instance 1 ─┐
Backend Instance 2 ─┼──> Redis (Centralized Enforcement)
│
└──> MongoDB (Rule Storage)
Why this setup?
If each backend instance keeps its own counters in memory, the rate limit breaks the moment traffic is distributed across servers.
For example:
- Backend 1 sees 3 requests
- Backend 2 sees 3 requests
- Limit is 5 requests
If both servers count independently, the client effectively gets 6 requests instead of 5.
That’s why Redis acts as the centralized enforcement layer. Every backend instance checks and updates the same counters, so the limit stays consistent regardless of which instance receives the request.
MongoDB stores dynamic rate limit rules, which means limits can be configured without redeploying the backend.
The algorithms I implemented
1. Fixed Window
This is the simplest strategy.
A counter is maintained for a fixed time window, such as:
- 100 requests per minute
- 1000 requests per hour
Once the window resets, the counter resets too.
Pros
- Easy to implement
- Low memory overhead
- Fast
Cons
It has the classic boundary burst problem.
A client could send:
- 100 requests at
12:00:59 - another 100 at
12:01:01
and effectively bypass the intended smoothness of the limit.
2. Sliding Window
To make enforcement more accurate, I implemented Sliding Window using Redis Sorted Sets.
Instead of counting requests in fixed buckets, the system stores request timestamps and continuously checks how many requests happened within the last N seconds.
At a high level, the logic is:
- Remove timestamps older than the active window
- Add the current request timestamp
- Count the remaining timestamps
- Block the request if the count exceeds the limit
This gives much more accurate rate limiting than fixed windows.
The problem with a naive implementation
My first approach was the straightforward one: perform the sliding window logic using multiple Redis commands from Node.js.
That meant every request required a sequence like:
- remove expired entries
- add current request
- count requests
- set expiry
It works, but it also means multiple Redis round trips per request. At small scale that’s acceptable. At higher traffic volumes, it starts to feel wasteful — especially because the whole operation really needs to be treated as one unit.
That’s what led me to Lua.
Why I used Lua for Sliding Window
To reduce latency and make the sliding window logic atomic, I moved it into a Lua script executed directly inside Redis.
Instead of making multiple Redis calls from Node.js, the backend sends a single script to Redis, and Redis performs the entire rate-limiting workflow in one go.
That solved three problems at once:
1. Atomicity
There’s no gap between removing old requests, inserting the new one, and checking the count. That avoids race conditions when multiple requests arrive close together.
2. Fewer network round trips
The logic runs directly inside Redis, so the backend doesn’t need to orchestrate multiple calls for every request.
3. A more production-friendly implementation
It made the sliding window approach feel much less like a demo and much more like something I’d be comfortable using in a real distributed service.
This was probably my favorite part of the project because it changed the sliding window implementation from “technically correct” to “something that actually scales more gracefully.”
3. Token Bucket
I also implemented Token Bucket to support smoother rate limiting behavior.
In this model:
- a bucket has a maximum token capacity
- tokens refill over time
- every request consumes one token
- if no tokens remain, the request is blocked
This is useful when you want to allow short bursts of traffic without removing limits entirely.
Compared to fixed windows, it feels much more natural for APIs where occasional bursts are acceptable but sustained abuse is not.
Dynamic Rule Engine
One of the things I didn’t want was a hardcoded rate limiter with a single global limit.
So I built a dynamic rule engine where rules are stored in MongoDB and managed through an admin dashboard.
Each rule can define things like:
-
target →
iporuser -
scope →
globalorendpoint - algorithm → fixed window / sliding window / token bucket
- limit and window configuration
- active / inactive state
That makes the system much more flexible because different limits can be applied to different scenarios instead of forcing one blanket rule across everything.
Rule Resolution Priority
When multiple rules could match the same request, I added a priority order:
- Endpoint + User
- Endpoint + IP
- Global + User
- Global + IP
- Fallback Rule
This allows the system to support more realistic cases.
For example:
- a premium user can have a higher limit on a specific endpoint
- anonymous traffic can be limited by IP
- everything else can still fall back to a global default rule
Fallback Protection
I also didn’t want a missing rule to mean unlimited access.
So if no specific rule matches a request, APIShield applies a safe fallback limit. That acts as a default protection layer and prevents accidental gaps in enforcement.
Violation Tracking
Blocking requests is useful, but I also wanted visibility into who was repeatedly crossing limits.
So I added violation tracking in Redis for:
- per-user violations
- per-IP violations
This made it possible to surface analytics in the dashboard and inspect how the system was actually being used.
Admin Dashboard
The project also includes an admin dashboard where rules can be:
- created
- updated
- deleted
- toggled on/off
and where rate limit violations can be monitored.
I liked this part because it turned the project from “a rate limiting middleware” into something that felt more like an internal platform.
Dockerized setup
The whole system runs with Docker Compose using separate services for:
redismongobackend1backend2dashboard
This made it much easier to validate distributed behavior locally.
One of the most satisfying tests was applying a global limit, sending alternating requests to both backend instances, and watching the 6th request get blocked even though the requests were split across servers.
That was the point where the system actually felt distributed instead of just pretending to be.
What this project taught me
APIShield taught me a lot more than just how to implement rate limiting.
It forced me to think about:
- how state should be shared across multiple backend instances
- when in-memory logic stops being enough
- how to reduce race conditions in distributed systems
- when to move logic closer to the data store
- how to design backend systems that are configurable instead of hardcoded
It also reminded me that some of the most interesting engineering problems aren’t always flashy user-facing features. Sometimes they’re the quiet infrastructure pieces that keep everything else stable.
Final thoughts
Rate limiting is one of those things users rarely notice when it works well, but systems definitely notice when it doesn’t.
Building APIShield gave me hands-on experience with distributed backend design, Redis scripting, and the tradeoffs behind different rate limiting strategies. It also made me appreciate how much engineering depth can hide behind a feature that, on the surface, sounds as simple as “block requests after a certain limit.”
If I continue iterating on this project, I’d love to explore:
- richer analytics dashboards
- per-plan throttling for SaaS-style use cases
- Redis Cluster support
- stronger observability and failure-mode handling
- benchmarking different strategies under load
If you’ve built something similar, or would approach rate limiting differently, I’d love to hear your thoughts.
If you want to explore the implementation in more detail, the project is here:
GitHub: APIShield













