The bm25() relevance score in FTS5 returns a negative float. Lower means more relevant. I ordered results DESC for three days and wondered why garbage surfaced first.
I run 12 production Workers handling Korean D2C ad-ops. One indexes ~400K ad keywords โ brand names, product slugs, mixed Korean/Latin search terms โ and drives a dashboard that rerenders on every keystroke. The original LIKE '%ํค์๋%' scans were clocking 180โ220ms in D1. That's a hard no for typeahead. FTS5 is shipped enabled in D1, no opt-in required, so the migration looked straightforward. It wasn't.
The first real trap is Korean tokenization. FTS5's default unicode61 tokenizer splits on whitespace and punctuation. Korean doesn't use spaces between morphemes consistently โ a single compound word can contain what a user types as two separate search tokens. That mismatch kills recall on single-morpheme queries. By week 3 at 180K rows I was measuring 88% recall on a hand-verified 500-row golden set, down from 91% at 40K rows. The drop isn't catastrophic but it's directional, and it gets worse as the index grows with noisier long-tail terms. The mitigation I landed on was query-side: splitting user input and joining tokens with OR before passing to MATCH, which recovers most of the recall loss without touching the tokenizer.
const ftsQuery = query.trim().split(/\s+/).join(' OR ');
The second trap is content tables. content='kw_master' means FTS5 stores only the index, not the raw text โ good for space, bad if you forget that inserts, updates, and deletes on kw_master don't automatically propagate. You need triggers, or the index silently drifts from the source table. I set up three triggers (insert, delete, update) and a one-time backfill. P50 latency at 390K rows after six weeks: 22ms. P95: under 45ms. Both within the sub-50ms target.
There's more to the full story โ the exact trigger DDL that handles the content table sync without double-writing, how I sampled latency inside the Worker without killing performance, and the one wrangler command sequence I run at 3am when the index drifts.
I wrote up the full breakdown โ including the week-6 benchmark table and the production incident that validated the trigger setup โ over on dailymanuallab.com.











