Your AI reply engine doesn't need a humor classifier

The fix for a tone deaf AI reply engine is not humor detection. It is better epistemic posture inside the existing prompt.

Here is what was happening. Every reply engine I run across five platforms (X, Threads, Bluesky, LinkedIn, and my own tooling) fed the parent post into the prompt as a claim to be evaluated. The instruction was effectively: here is what this person said, respond to it. The model, being a diligent corrector, treated every input as a sincere assertion and fired back.

A joke post about LLMs replacing frontend devs by Q3 got a detailed technical breakdown of why the timeline was off. A satire thread got earnest engagement. Posts that were clearly performative got three paragraph rebuttals. The replies were not factually wrong. They were socially incoherent.

The obvious fix: add a humor detection step before generating the reply. Run a separate classifier call, get a "sincere or joke" label, branch on it. More calls, more latency, more moving parts to break.

That is the wrong fix.

The model was not failing to detect comedy. It was failing to ask whether the parent post was even making a sincere claim before deciding to respond as a corrector. Those are different problems with different solutions.

The actual fix was one prompt fragment added as a silent step 0 inside the existing claude p call. The function is humor_aware_directive() in arihantdeva_core.humor. It tells the model to first read whether the parent is sincere or performative before choosing a register. If it lands on comedy, the instruction is specific: soften, drop the corrective posture, acknowledge the joke once, then still land a substantive point. Not silence. Not pure register matching, which just produces "ha, yes, exactly" noise. One grounded observation, lighter touch.

Zero added LLM calls. No classifier gate. No application layer branching. The humor read happens inside the same generation call, as part of the model's own reasoning step.

Wiring this into five separate engines without behavior divergence was the real engineering problem. Each reply path had its own prompt construction: generate_comment, _generate, draft_comment, draft_reply, _comment_prompt. Patching each independently means they drift as prompts evolve. The right move was to splice the directive into the shared base prompt that critic retries already inherit. One definition. Five consumers. If it changes, it changes everywhere.

The unit test checks that the directive marker string reaches each engine's built reply prompt. Not a behavioral test. A wiring test. If someone removes the splice point or forks the base prompt and forgets to carry the directive, the test catches it. That is the only guarantee worth having here.

What I would do differently: the instruction currently says "acknowledge the joke lightly." That phrasing is vague enough to produce anything from one word to two sentences of setup. I should constrain it: "one clause, then move to substance." Specific constraints beat adjectives like "lightly" every time when you are prompting a model.

The broader lesson is not "prompt engineering beats classifiers" in the abstract. It is: before adding infrastructure, check whether the existing infrastructure has the right epistemic contract. Most tone failures in reply engines are not detection failures. They are posture failures. The model already knows what comedy is. It just was not asked to check first.