The Find Games Like directory I'm building at findindiegame.com is built entirely on Steam data: a nightly ETL pulls from the Steam API, generates content with Claude Haiku, and writes rows into Turso libSQL. Every game in the system has an integer appid, a Steam store URL, and optionally Humble/Fanatical affiliate links.
That pipeline covers thousands of games. But my own release, Shin KoiKoi — a free Hanafuda Koi-Koi implementation I shipped last week — lives on itch.io and has no Steam listing yet. Leaving it out of the directory felt like a solvable gap.
The naive fix would be adding an itch.io bypass flag into the ETL. I didn't want to do that. The ETL runs on a 6-hour cron, touches five tables, and assumes integer appid throughout. Injecting special-case logic for itch entries would mean auditing every query for null handling. Too much blast radius for what might stay at one or two entries.
The curated overlay: a JSON sidecar
Instead I added src/data/curated.json — a static list of GameEntry objects, manually maintained, that lives next to the ETL-populated database rather than inside it.
The merge happens in src/lib/curated.ts:
export function getMergedGames(): GameEntry[] {
const steam = games as unknown as GameEntry[];
const manual = curated as unknown as GameEntry[];
const seen = new Set(steam.map((g) => g.slug));
const merged = [...steam];
for (const g of manual) {
if (!seen.has(g.slug)) merged.push(g);
}
return merged;
}
getStaticPaths() in [slug].astro calls getMergedGames() instead of the old getAllGames(). The Astro build sees itch.io games as first-class entries — same slug-based routes, same template, same structured data.
I added two optional fields to GameEntry:
source?: "steam" | "itch";
external_url?: string;
The isSteam() helper is deliberately two lines:
export function isSteam(g: GameEntry): boolean {
return !g.source || g.source === "steam";
}
The default-to-steam behavior matters: existing Steam entries in Turso don't have a source column, so they'd silently fail a g.source === "steam" strict equality check. Defaulting unset entries to Steam avoids a backfill migration.
Writing the itch.io scaffolding script
Adding entries by hand to a JSON file gets old after two or three. I wrote tools/add-itch-game.mjs — 82 lines of Node.js that takes an itch.io URL, scrapes the page's og: tags, and appends a scaffolded entry.
The metadata comes from four pattern matches:
const itchPath = pick(/<meta name="itch:path" content="games\/(\d+)"/);
const ogImage = pick(/<meta property="og:image" content="([^"]+)"/);
const ogTitle = pick(/<meta name="twitter:title" content="([^"]+)"/)
?? pick(/<title>([^<]+)<\/title>/);
const author = pick(/<a href="https:\/\/[^"]+\.itch\.io">([^<]+)<\/a>/);
The itch:path meta gives a numeric ID I use as the appid field. Keeping appid populated avoids null handling in several downstream spots that assume it's always an integer. The slug combines the title slug with that ID: shin-koikoi-4534256.
The script also sets model_used: "curated-itch-auto" on scaffolded entries and I set model_used: "curated-manual" on entries I fill in by hand. Both values are distinct from anything the nightly ETL writes (claude-haiku-4-5-*), which lets me filter the populations cleanly in any future queries or CI assertions without parsing additional fields.
When the script finishes, it prints a reminder to fill in good_for, avoid_if, and summary — those don't have reliable og: equivalents, so the scaffold puts placeholders. I don't run Claude against curated entries automatically yet; the count is small enough that writing them by hand is faster than wiring up another codepath.
Adapting the slug page
Two changes on the game detail page. First, the store link now branches:
const storeUrl = steamGame
? `https://store.steampowered.com/app/${game.appid}/`
: (game.external_url ?? "#");
const storeLabel = steamGame ? "View on Steam" : "Play on itch.io";
Second, affiliate store links (Humble, Fanatical) are Steam-only. An itch.io game searched on Humble usually returns nothing useful:
const otherStores = steamGame
? [
{ label: "Humble Store", url: humbleSearchUrl(game.name, aff.humblePartner) },
{ label: "Fanatical", url: fanaticalSearchUrl(game.name, aff.fanaticalAffiliate) },
].filter((s): s is { label: string; url: string } => s.url !== null)
: [];
There's no affiliate network for itch.io. Shin KoiKoi is free anyway, so I link directly and leave the affiliate block empty for itch entries.
What this doesn't handle
No AI-generated content for curated entries. The nightly ETL runs generate-content.ts against Turso rows where model_used is null or a placeholder value. Curated entries live in a static JSON file, not in Turso, so the ETL never sees them. I write their summary/good_for/avoid_if by hand. That's fine at one game; at ten or twenty I'd need to pipe them through the generation script separately or migrate them into Turso.
og: tag scraping is fragile. add-itch-game.mjs pattern-matches on specific meta tag structures that could break with a layout change. For my own game I'll notice immediately. For community-submitted entries I'd want something more stable, probably the undocumented itch.io API that third-party scrapers use, or an input form that lets submitters provide their own metadata.
Review counts stay null. Steam entries get total_reviews, total_positive, total_negative from the Steam API. Itch.io has no equivalent public endpoint. The template handles nulls by omitting the review section, but itch.io entries look thinner next to Steam entries with thousands of ratings. Whether that matters for engagement I won't know until the directory has traffic. I'll publish numbers in 30 days.
schema.org VideoGame offers. The current structured data for itch.io entries hardcodes price: "Free", which is correct for Shin KoiKoi but won't generalize. If I add a paid itch.io game I'll need to read the price from the og: tags or the curated.json entry explicitly.
What I'd do differently
The JSON sidecar was the fastest path to shipping. But if I were starting from scratch, I'd put curated entries into Turso from the beginning with a source TEXT DEFAULT 'steam' column rather than a parallel file. The sidecar means every component that builds a game list has two code paths (getAllGames() + getMergedGames()), and those diverge as the codebase grows. The tradeoff was acceptable for a one-afternoon addition, but it's technical debt I'll want to resolve before this pattern spreads to the other two sites.
Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.











