The feature that wrote a database row every minute — even when nothing happened

My drift detector keeps a history so you can see a trend line: how much your AWS
diverged from Terraform over time, run by run. To draw that line I record a
DriftSnapshot after every scan, every tfstate import, every S3 sync.

Here's the part I didn't think through: I record one even when there's zero
drift. That's on purpose — a flat line is data too; "nothing drifted for three
weeks" is exactly what you want the chart to show. But it also means a scheduler
ticking every minute writes a row every minute, forever, whether or not anything
changed. Append-only, no ceiling. A slow-motion time bomb in a table nobody looks at.

Why I didn't notice for a while

The history view caps what it shows:

snapshots = list(DriftSnapshot.objects.filter(environment=env)[:100])

So the UI looked fine — a tidy 100 rows. Meanwhile the table underneath had tens
of thousands. The display limit was quietly hiding the growth from the one place
I'd have spotted it. The lesson that stuck: a LIMIT in your read path is not
retention. It bounds the query, not the data.

The fix has to live where the rows are born

My first instinct was a nightly cron that trims old rows. That works, but it
leaves the table unbounded between runs, and it's one more moving part to forget.
The rows all come from a single function — every scan/import/sync funnels through
the same _record_drift_snapshot(). So the cleanest place to enforce a ceiling is
right there, at the moment of creation:

snapshot = DriftSnapshot.objects.create(
    environment=environment,
    source=source,
    changed_count=len(changed),
    added_count=len(added),
    unchanged_count=unchanged,
    detail={'changed': changed, 'added': added},
)
# we append a row every run (even at zero drift) — trim the oldest beyond the cap
DriftSnapshot.prune(environment)
return snapshot

One chokepoint, so the table is bounded continuously — not "eventually, when the
cron next fires."

Keeping the newest N, per environment

The prune itself is two queries. Find the PKs of the newest N you want to keep,
then delete everything else for that environment:

@classmethod
def prune(cls, environment, keep=None):
    if keep is None:
        from django.conf import settings
        keep = getattr(settings, 'DRIFT_SNAPSHOT_RETENTION', 0)
    if not keep or keep <= 0:
        return 0  # 0 / unset = unlimited, opt out cleanly

    keep_ids = list(
        cls.objects.filter(environment=environment)
        .order_by('-detected_at')
        .values_list('pk', flat=True)[:keep]
    )
    deleted, _ = (
        cls.objects.filter(environment=environment)
        .exclude(pk__in=keep_ids)
        .delete()
    )
    return deleted

Two things I'd flag if you copy this:

Scope by environment. The cap is "newest N per environment," not N total. A global [:keep] would let a busy prod environment evict a quiet staging one's entire history.
.values_list('pk', ...)[:keep] then .exclude(pk__in=...). The slice becomes a LIMIT in SQL, so you never pull thousands of rows into Python just to decide what to delete. For per-write pruning where keep is small, that matters.

The cap is a setting, defaulting to something sane, with <= 0 meaning "unlimited"
so anyone who actually wants the full append-only log can opt out:

DRIFT_SNAPSHOT_RETENTION = int(os.getenv('DRIFT_SNAPSHOT_RETENTION', '500') or '0')

Don't forget the rows you already have

The chokepoint guard fixes new growth. It does nothing for the table that's
already bloated — those old rows were written before the cap existed and will sit
there until something deletes them. So I added a management command to backfill the
cleanup across every environment, with a dry run because deleting history deserves a
look-before-you-leap:

$ python manage.py prune_drift_snapshots --dry-run
prod:    would delete 41,902 (of 42,402)
staging: would delete 0 (of 88)
Done: would delete 41,902 snapshot(s), keeping 500 per environment.

$ python manage.py prune_drift_snapshots
prod: deleted 41,902 (kept 500)
Done: deleted 41,902 snapshot(s), keeping 500 per environment.

Same prune() underneath, just iterated over all environments — the per-write
guard and the bulk cleanup share one implementation.

One nice side effect: none of this needed a migration. It's a classmethod and a
call site, not a schema change. The existing rows are fine; they just stop being
immortal.

Takeaways

An "append-only history" feature that records even when nothing changed is unbounded by design. Decide its retention the day you build it, not the day the disk fills.
A LIMIT/[:N] in your read path bounds the query, not the table. It will happily hide unbounded growth from your own UI.
Enforce the ceiling at the single point where rows are created, so the table is bounded continuously — not at a periodic job that leaves gaps.
Ship a separate cleanup command for the rows that accumulated before the cap, and give it a --dry-run. New-growth guard and historical backfill aren't the same job.

This is the drift-history view in a self-hosted tool that tracks how your live AWS
drifts from Terraform over time — open source (MIT), one docker compose up:
syncvey.com. How do you handle append-only tables that exist
to be charted — hard cap, time-based TTL, or roll-ups into a summary table?

The feature that wrote a database row every minute — even when nothing happened

Why I didn't notice for a while

The fix has to live where the rows are born

Keeping the newest N, per environment

Don't forget the rows you already have

Takeaways

Tags

Author

Stats

Published

You Might Also Like

I added TOTP 2FA to my Django app in ~40 lines and no 2FA library — but one line decides whether it's real

From Feature Delivery to Platform Engineering.

From Feature Delivery to Platform Engineering: Scaling Earth Observation Pipelines.

How I Architected Automatic Parking Detection in Django — Bluetooth Disconnects, Geofence Events, and a Strict State Machine

How I Built a Carbon Footprint Tracker with Django + NVIDIA NIM

Your Django background scheduler is probably running twice. Here's why, and the migrate chicken-and-egg.