Sensors: The Other Half of the Harness

The pre-commit hook caught a migration last Tuesday that would have shipped to staging green and broken production on the next deploy. The migration dropped a foreign key constraint without naming a backfill plan, and the rule that said every constraint change needs a backfill plan had been sitting in migrations.md for four months. The rule didn’t stop the agent. The rule didn’t stop the human reviewer either. The diff was 38 files and the constraint drop was a single line. What stopped it was a four-line shell script wired to pre-commit that grepped the staged migration for DROP CONSTRAINT and exited non-zero if no--backfill: comment followed it.

That four-line script is what I want to talk about. Most of what gets written about agent harnesses is about rules — what CLAUDE.md should say, where to put it, how to scope it, when to version it. Rules are half the harness. The other half is the set of checks that fire when a rule gets broken. I call them sensors, and they get talked about a lot less than they should.

The asymmetry in the conversation

Read any of the well-circulated posts on agent harnesses and count the words spent on each side. Rules are the centerpiece. Sensors are glanced over, as if they’re so obvious they don’t deserve their own treatment. They aren’t obvious, and the assumption that they’re already there is the most expensive assumption in the field.

The asymmetry has a reason. Rules are easier to write. A rule is a paragraph. A sensor is a script, a hook, a CI step, a pre-commit config, a custom check. Rules sit at the cognitive level a writer naturally operates at; sensors sit at the level of plumbing. The first one is a lot more fun to write than the second.

The asymmetry also has a cost. Rules without sensors are vibes. The agent reads them, claims to follow them, and gets graded on whether the resulting code looks like it followed them. Looks like is the failure mode. The agent is a probabilistic system. It will obey a rule most of the time and skip it some of the time, and the times it skips are the times the rule mattered most: the awkward branch, the migration nobody likes to think about, or the edge case the rule was written to catch.

A sensor changes the contract. The rule is no longer a hope. It’s a falsifiable check.

What a sensor actually is

A sensor is anything that can detect whether a rule was followed and signal that detection in a way the workflow respects. The shape is narrow:

It runs deterministically.
It returns a clear pass or fail.
It fires at a point in the workflow where its result still matters.
It’s cheap enough that nobody is tempted to skip it.

Most of the sensors in a working harness aren’t fancy. A grep with set -e. A line in a linter config. A pytest fixture that asserts the database fixture hasn’t drifted. A pre-commit hook that runs mypy on changed files. A GitHub Actions job that fails the PR if the migration directory has a file without a paired rollback. None of these is impressive on its own. The point isn’t the individual sensor; it’s the discipline of having one for every rule that matters.

Five places a sensor fires

There are five workflow positions where sensors earn their pay. Each catches a different class of failure. A working harness has sensors at most of them.

Pre-edit

The earliest sensor is the one the agent encounters before it writes code. The rule is loaded into context; the sensor is something the agent can run to check its own work before producing output. A type stub generator, a schema dump, a --dry-run command that simulates the change. The agent doesn’t need approval; it just needs to see what its proposed change would do.

Pre-edit sensors are the rarest of the five, because they require giving the agent tools, not just rules. The pay-off is that they catch errors before any bytes hit disk. The agent that can run prisma validate on a proposed schema change is going to make fewer broken commits than the agent that can only read the rule that says make sure your schema is valid.

Pre-commit

Pre-commit is where most teams put their first sensor, because Git makes it easy. A .pre-commit.config.yaml or a husky hook fires the moment the agent (or human) tries to land a change. The check has the staged diff to look at, the rest of the repo for context, and a hard exit code that the workflow respects.

This is the sensor layer for rules that constrain the artifact — what the code looks like, not what it does. Format, lint, type, dead code, banned imports, naming patterns, the migration backfill check above. Cheap, fast, local. The agent that breaks a pre-commit hook either fixes the violation or doesn’t commit. Either outcome beats the agent commits and nobody notices.

Pre-PR

A pre-PR sensor is one that runs after the commits exist but before the diff goes up for review. The check has more to work with: a full branch, a base ref, a set of changed files that’s no longer one commit at a time. It’s the right layer for cross-file checks. Did you change this API without updating its consumers? Did you add a new migration without bumping the schema version? Does the test coverage diff drop more than two points?

Pre-PR runs cheap on CI and slow locally. Most teams run it on the branch push, which catches the agent’s work right as it’s offering the work for human attention. That timing matters. A check that fires after the human has started reviewing is a check that’s been skipped.

Post-merge

Post-merge sensors are the ones that run against main. They aren’t strictly part of the agent’s loop, but they’re part of the harness because they detect when the loop produced something that broke once integrated. Smoke tests, end-to-end suites, schema-diff jobs that compare staging to production, query-plan monitors that fire when a slow query joins the rotation.

The instinct is to think of post-merge as CI and not as part of the harness. That’s a mistake. The agent’s behavior is shaped by what fails after it shipped. A post-merge sensor that catches a regression and gets traced back to a missing rule is the most valuable kind of sensor in the whole stack. It tells you something the inner sensors missed, and it tells you what to add upstream.

Drift

The fifth and most-skipped sensor is the one that checks whether the rules themselves are still true. The codebase moves. The framework version bumps. The pattern the rule used to describe gets replaced by a new pattern. The rule is now wrong, and nobody notices because nothing fires.

A drift sensor is the check that wakes up periodically and asks: does this rule still match reality? Sometimes that’s a literal grep for the pattern the rule references. Sometimes it’s a count of how many files in the codebase still match the rule’s example. Sometimes it’s a last-modified audit that flags any rule older than six months for review. The cadence is monthly or quarterly; the goal is to keep the harness honest as the code beneath it changes.

I’ll write a whole post about rule-rot soon. The short version: a drift sensor is the difference between a harness that’s six months old and helpful and a harness that’s six months old and lying.

The pairing rule

If a rule matters, it has a sensor. That’s the principle. It sounds rigid, and it is, on purpose.

The discipline works because it forces a real question every time a rule gets written: how would I know if this got violated? If the answer is I’d see it in code review, the rule is on probation. Code review is slow, expensive, and inconsistent. It is also the wrong layer for rules that produce mechanical violations.

If the answer is the linter would catch it, the rule probably doesn’t need to be a rule. The linter is the rule. Just configure the linter and skip the prose.

The interesting cases are the rules where the answer is nothing currently checks for this, but I could write a check. Those are the rules that earn a sensor. The sensor doesn’t have to ship the same day the rule ships, but it should be on the list and have a date.

The rules that aren’t worth a sensor are the rules that aren’t worth writing down. Be thoughtful is not a rule. Bullet points should be sentence fragments unless they form a list of full sentences is also not a rule, because nothing catches a violation and the cost of writing it down is more than the cost of fixing the rare slip in editing.

A rule with no sensor is either too vague to matter or important enough that someone should pair it within the month. Either way, the absence of a sensor is information.

What good sensors share

The sensors that survive in production look alike. They share five traits.

They run in seconds. A pre-commit hook that takes 90 seconds gets bypassed within two weeks. A pre-PR check that takes 12 minutes gets the --skip-checks shortcut added to the team Slack. The check has to be fast enough that running it is cheaper than figuring out how to avoid running it.

They produce clear messages. A failed sensor that prints EXIT 1 is a failed sensor that’s about to get ignored. A failed sensor that prints migrations/20260620_drop_users_index.sql dropped an index without a paired backfill comment. Add a comment starting with --backfill: explaining the plan, or annotate the migrations with --no-backfill-needed: is a sensor that teaches the agent (or the human) how to fix the violation.

They run locally. A check that only exists in CI is a check that the agent finds out about ten minutes after it pushed. Running the same check locally before push closes the loop. The pre-commit framework and npm run check-style scripts both make this easy; the discipline is to make sure CI runs the same scripts the developer can run.

They cite the rule they enforce. Every sensor message that earns its weight names the rule. *Failed: rule MIGRATIONS-04 (foreign-key changes require a backfill plan).* The agent reads the message; if the rule is wrong, the agent can find it and propose a change. The sensor is the gate, but the rule is the explanation.

They fail loud and pass quiet. A sensor that prints OK checked 47 migration files in 1.2s every commit is a sensor whose output everyone learns to ignore. Pass silently. Fail with a paragraph. Save the human’s attention for the cases that need it.

Where sensors live in the project

A working sensor layout has four physical homes. Each holds the sensors for a different workflow position.

.husky/ # or .pre-commit-config.yaml
  pre-commit # fast, staged-files only
  commit-msg # commit message linting
.github/workflows/
  pr-checks.yml # pre-PR, runs on push
  post-merge.yml # post-merge, runs on main
.claude/
  hooks/ # agent-facing pre-edit checks
  sensors/ # one shell script per drift check
scripts/check-*.sh # the actual checks, invoked from above

The shape isn’t sacred. What matters is that each sensor has a single home and the workflow knows where to call it. The anti-pattern is the same check running in three places with slightly different logic and the team unable to remember which one is authoritative.

The sensors themselves go in scripts/ (or wherever your project keeps tooling) and get called by whichever runner needs them. That separation means the same check can fire from pre-commit, from CI, and from a drift-audit cron without three copies. One script, three invocations.

Personally, I prefer to wrap these scripts in make tasks. The benefit that I receive is that the agent always knows to use make. And that makes everything easier to find and more consistent. I use make in CI too. Usually, I work with docker and so this is a natural pair for me. Isolate the environment with docker; isolate the commands with make. That makes it easier for the human and the agent to work with the repository.

Two sensors worth writing this month

The two highest-value sensors for a team starting from zero are both small.

The migration safeguard. Whatever your migration tool is, the team’s hardest-to-catch bugs ship through it. A pre-commit script that knows the patterns your migrations are supposed to avoid — DROP COLUMN without paired backfill, ADD COLUMN NOT NULL without a default, schema-breaking renames without a two-phase plan — catches more production issues than any rule on its own. The agent will write migrations that look right and miss the constraint. The sensor sees what the human reviewer doesn’t.

The diff-budget gate. A pre-PR check that fails if the diff exceeds N files or M lines. The number is yours to pick. I use 15 files and 400 lines, which roughly maps to a PR a human can review in one sitting. The rule that says keep PRs small is exactly the kind of rule that gets nodded at and then drowned in the next month. The sensor that fails the build at file 16 is the rule with teeth.

Both are under fifty lines of shell. Both pay for themselves the first week.

The anti-patterns to watch

A few sensor failure modes recur often enough to name.

The unrunnable sensor. It lives in CI, takes 14 minutes, and depends on a secret only the deploy bot has. Nobody on the team can run it locally. By the time it fires, the human reviewer has already mentally signed off. The fix: factor out the local-runnable piece and run that on every commit.

The sensor that always passes. Someone wrote it months ago, the inputs changed, the assertions still hold, the failure case is no longer reachable. The sensor is a green light glued to the dashboard. The fix: every sensor needs at least one known-bad fixture that the test suite runs against it to confirm it still detects the violation it was written for.

The flaky sensor. It fails sometimes and passes the rest. The team learns to rerun. Within a month, the rerun habit has spread to the other checks, and a real failure gets bypassed because it looked flaky. The fix: a flaky sensor is broken. Pull it from the gate, fix it, or delete it. Do not leave it firing.

The rule with no sensor. The whole point of this post. A rule that lives in CLAUDE.md and has no check is a rule that gets followed when the agent feels like it. The fix is the pairing rule above: if the rule matters, pair it within a month.

The sensor with no rule. Less common but worth naming. A check fires, a developer doesn’t know why, the rule it’s enforcing isn’t written down. The check is correct on its own terms but unteachable. The fix: cite the rule in the failure message. If you can’t, write the rule.

Pair one rule to a sensor this week

Pick one rule from your CLAUDE.md — the one you find yourself reminding the agent about most. Open a file. Write a check. Wire it to pre-commit. Ship it.

The right rule for the exercise is one with a clear violation pattern. Always use the Result type for fallible operations is a fine candidate; you can grep for throw in the changed files and warn. Match the existing code style is a poor candidate; the violation pattern is too diffuse for a small sensor to catch.

When the sensor catches its first violation, two things happen. The rule becomes load-bearing. The agent now has to satisfy it instead of just acknowledge it. Additionally, you find out whether the rule was right. Sensors that fail a lot on changes that look reasonable in review are sensors enforcing rules that need to change. The sensor is the feedback loop; the rule is the hypothesis.

Add a second one next month. By the end of the quarter your harness will feel different. Not because the rules are smarter, but because the rules can be checked.

The rules are the part everyone writes. The sensors are the part that makes them true.