AI reviewers fall for repackaging attacks

Minor presentation tweaks can inflate AI‑reviewer scores by more than a point on a ten‑point scale. The paper demonstrates that a closed‑loop attack, which edits only the abstract, contribution statements and narrative while leaving all scientific evidence untouched, yields a +1.21 increase in the average reviewer rating. This gain is far larger than what ordinary prose polishing achieves, exposing a new optimization surface for automated vetting systems.

Previously, robustness concerns about AI‑driven peer review centered on hidden prompts, prompt injection, or explicit content manipulation. Those attack vectors required the adversary to embed concealed instructions or modify figures, equations, and results, and defenses were built around detecting such overt tampering.

Across three mainstream AI reviewers, adversarial repackaging achieves a 75.1 % attack success rate and a mean score gain of +1.21/10 [1]. The method iteratively generates reviewer feedback, mutates the presentation, and selects the version that maximizes the score, proving that the review pipeline can be steered without touching the underlying science.

The backfire rate for weaknesses (31.6 %) is 2.6 times that for strengths (12.4 %) [1]. This asymmetry shows that AI reviewers are easier to impress than to convince: highlighting strengths reliably raises perceived merit, while attempts to downplay flaws often backfire, allowing unchanged evidence to be reinterpreted as stronger contribution.

The study also reveals that “strategies that change how the reviewer interprets the paper, such as related‑work repositioning and analytical discussion expansion, substantially outperform surface edits such as local polishing, table formatting, and algorithm boxes” [1]. Consequently, attacks that reshape narrative framing are far more effective than mere cosmetic tweaks, suggesting that current reviewer models lack a robust anchor to the scientific content.

These results imply that any conference pipeline that relies on raw AI‑reviewer scores must incorporate content‑anchoring checks or adversarial testing against presentation‑only edits. Running the released rolling benchmark on new reviewer models will become a prerequisite before deploying them in production, lest organizers unwittingly invite papers that have merely been repackaged for higher scores.