I built a spot-the-difference solver, and the trick to it was the same one used to find Pluto

I'm bad at spot-the-difference puzzles. The two-pictures-side-by-side kind, "find the 10 differences." I'll get six or seven and then sit there finding nothing. The last one or two are always brutal: a line that's a little longer, a brown that's slightly more orange, a difference hidden in the caption text instead of the picture.

So I built a tool that solves them with me. Not for me, with me — that distinction turned out to matter, and I'll get to why.

The trick: blinking, not subtracting

My first instinct, way back, was full automation. I had the idea around the time scikit-learn was the thing, so I figured I'd train a model to just spit out every difference. Solved, done, here's your answer.

Then I was watching a kids' science show on TV and they had this bit where they overlay two drawings and rapidly swap between them. The spot that's different appears to wiggle while everything else stays still. That was the pivot moment — that's it, I thought, and threw out the machine learning plan.

The technique has a real name: blink comparison. And here's the part I love. In 1930, Clyde Tombaugh discovered Pluto with exactly this. He'd take two photographic plates of the same patch of sky a few nights apart, blink between them, and watch for anything that moved. Fixed stars stay put; the thing that jumps between exposures is the wiggling dot. Pluto was a dot that wiggled.

So the app doesn't decide what's different for you. It lines the two halves up and lets you blink between them. You hold a finger on the screen to fade one half into the other, release to swap back. The different spot wiggles. You're the comparator. The wiggling motion is also just genuinely cute to watch, and you still get the "I found it!" feeling — which felt important for something that's supposed to be a puzzle, not a cheat sheet.

The actually hard part: making two photos line up

Blinking only works if the two halves sit exactly on top of each other. A photo of a placemat does not give you that for free. You shoot it at an angle, the paper bows a little, and you've got two warped trapezoids floating over each other.

Step one is perspective. The app finds the four corners of the paper (OpenCV.js: Canny edges, grab contours, pick the most rectangle-ish quad) and runs a homography to lift the trapezoid back into a flat rectangle (getPerspectiveTransform + warpPerspective). Standard stuff, works great on a clean flat print.

Real photos aren't clean. Low contrast between paper and table, shadows, wood grain — the auto-detector misses. So there's a sensitivity toggle (strict / normal / loose — strict, normal, loose, which nudges the detection thresholds), and if that still fails you can drag the four corners by hand.

Step two is the curve, and this is the bit I'm proud of. Even with all four corners perfect, a sheet of paper that's bowed in the middle still won't line up — the center bulges. A four-corner perspective transform can only map a flat plane. So I added a fifth anchor in the center and split the image into four triangles, warping each one independently. Five-point mesh warp. The center bulge gets absorbed.

That introduced a great bug. Faint X-shaped seams appeared along the triangle borders, like the image had cracked. Canvas clip() can't turn off anti-aliasing, so the boundary pixels didn't have full coverage and what was hidden behind leaked through the cracks. The fix was dumb and total: grow each triangle by 1.5px so they overlap. Seams gone. There's a law I half-believe in by now — the harder the bug, the shorter the fix. Some of the best ones are a single ugly line.

The other stuff that fought me

The long tail, kept short:

The last touchmove. Dragging a corner felt slightly wrong and I couldn't tell why. You know how a slider nudges itself off-target the instant you let go? That tiny final twitch. Turns out the last touchmove event before you lift your finger is the culprit. I ignore it now. Dropping that one event is what made dragging feel good.
Curved books and oversized sheets that don't fit one frame: there's a two-shot mode where you photograph the left and right halves separately. As a bonus it handles vertically-stacked puzzles too.
Sharing the result as a GIF. I assumed GIF was the ugly legacy option and APNG/WebP would be better. Then I posted them and nothing animated — viewers and apps just show frame one. GIF animates everywhere, even auto-plays without a tap. GIF won. One gotcha: turn dithering off, or the diffusion noise looks like differences that aren't really there.
Video export was slow with MediaRecorder, so I rebuilt it on WebCodecs + mp4-muxer driving the GPU's hardware encoder (prefer-hardware).
The odd-pixel boss bug. Video export failed on exactly one image, on Android only. Cause: H.264's 4:2:0 chroma needs even width and height, and PC encoders quietly tolerate odd dimensions while Android's hardware encoder refuses. Whether it triggered depended on your photo's resolution. Fix: truncate output dimensions to even with w & ~1. Again — nastiest bug, one line.
localStorage filled up. I'd been stuffing each saved image in as a data URL, and after a few saves it just refused to store more. Moved the saves to IndexedDB (via idb-keyval) and kept them as Blobs. Problem gone.

There's also a second export mode I didn't expect to like as much as I do: a 60-second crossfade video. It morphs left → right over 30 seconds, then back right → left over another 30, looping. (Why round-trip? Some players snap hard back to frame one when a clip ends, and the jump ruins it.) Thirty seconds each way tested out as the sweet spot. It makes the puzzle harder, slower, more hypnotic — the opposite of solving it. Which somehow felt right.

Where the salad comes from

If you're wondering about the name. This started as a tool for one specific thing: a Japanese family restaurant called Saizeriya prints a notoriously evil spot-the-difference puzzle on its table mats, and the final difference is genuinely sadistic. I just wanted to beat it. (The Japanese name is a pun on a salad on the menu and on the word for these puzzles.) I half-suspect the puzzle is there as applied physics — a thing to grind on so the wait for your food feels shorter.

The honest workflow behind the app: I got it working in about a week once the spec settled, then spent roughly three months going back to Saizeriya, eating the small shrimp salad, and fixing whatever broke that day.

But once the perspective-and-blink machinery existed, none of it was restaurant-specific. It's just two-similar-images-and-a-camera. Puzzle books, kids' menus, a quiz frozen on a TV screen — anything you can photograph, flat on a table or stuck on a wall.

What I deliberately didn't build

No server. No AI API. No image uploads. Every photo is processed in your browser and never leaves your device — zero privacy worry, zero running cost. It's a Next.js app, installable as a PWA. The UI is in English too (and there's a QR code on the home screen to hand it to a friend).

And there's a sample puzzle image you can download and load through "choose from album" if you want to try it without going to a restaurant first.

Try it

machigai-salad.llll-ll.com — works on phones and desktops. On a phone you get a camera button and a separate "choose from album" option.

Source is on GitHub.

If you try it on a puzzle that's been beating you, I'd genuinely like to know whether it worked. And if you've fought corner detection on messy real-world photos before, I'd love to hear how you handled it — that's still the weakest link.

Suggested tags: webdev, opencv, javascript, programming (also viable: pwa, imageprocessing, showdev)