I ported js-framework-benchmark's protocol onto Vel's core — no window, no GPU, no vsync, pure CPU — because I wanted to know if "compiles to native C++" actually translated into the layout numbers I was assuming it did. Each row is a color swatch, a flexible text label, and a button. I created 10,000 of them and measured.
The first number was great. The second number was embarrassing.
What the benchmark found
| rows | build (tree) | layout (cold) | relayout (warm) |
|------:|-------------:|--------------:|----------------:|
| 100 | 0.03 ms | 0.04 ms | 0.04 ms |
| 1000 | 0.26 ms | 0.30 ms | →19.6 ms← |
| 10000 | 1.99 ms | 2.22 ms | → 206 ms← |
Build — constructing 10,000 widgets — costs 2ms. This is where the "native C++" thesis pays off bluntly: C++ allocation crushes JavaScript's createElement. Nothing to do here; it was already faster than the framework I was benchmarking against.
Now look at the last column. Relayout (warm) is the cost of laying out a tree that did not change — same widgets, same text, same constraints, the steady state you're in every frame while the user just moves the mouse. It cost 206ms for 10k rows. That's not a 60fps frame; that's three frames dropped to lay out a list that didn't move.
And the tell is in the comparison: warm relayout cost the same as the cold pass. Re-laying-out an unchanged tree was doing the full amount of work as laying it out for the first time. That's the signature of zero memoization — every frame, from scratch, as if it had never seen this tree before.
The cost wasn't geometry — it was text
My assumption was that "layout is slow" meant the flexbox math was slow: constraint propagation, two passes, intrinsic sizing. I was wrong, and a profiler said so immediately. The 206ms was almost entirely in one place:
Text::measure → FreeTypeRasterizer::measureText, which walks each string codepoint by codepoint and asks FreeType for the advance width of every glyph via FT_Load_Glyph.
Per frame. For every label. Whether or not the text had changed.
FT_Load_Glyph is not free — it's loading and scaling a glyph outline to get its metrics. Doing it for every character of every visible string, 60 times a second, on text that is identical to last frame, is pure waste. The geometry math was a rounding error next to it. I'd been optimizing the wrong mental model of where layout time goes.
Two caches, both valid forever
The fix is in engine/src/text/FreeTypeRasterizer.cpp — two process-lifetime caches:
-
Per-glyph advance cache, keyed on
(face, pixelSize, codepoint). A glyph's advance width never changes for a given face and size, so the first time you measure an 'e' at 32px you load it; every 'e' after that is a hashmap hit. This makes even cold layout of varied text fast, because common characters are shared across every string. -
Per-string width cache, keyed on
(face, pixelSize, string). Relayout of unchanged text becomes a single lookup — measure the label once, and every subsequent frame that the text is identical is O(1). It's bounded to 200k entries so that live typing (which generates a new string every keystroke) can't grow it without limit.
The key insight that makes this safe: these values are immutable. A glyph advance for a fixed face and pixel size is a constant of the universe; it will never be different. So there's no invalidation logic, no staleness, no cache-coherence problem — the hard part of caching simply doesn't exist here. You compute it once and trust it forever.
The result:
| rows | relayout before | relayout after | speedup |
|---|---|---|---|
| 1000 | 19.6 ms | 0.30 ms | ~65× |
| 10000 | 206 ms | 2.2 ms | ~93× |
A 10,000-row list now relays out in ~2ms — comfortably inside a 60fps frame. That's the budget Figma-class apps live in, and it was the difference between "compiles to native, therefore fast" being a slogan and being true.
The other half: don't lay out at all
Caching makes a frame cheap. The bigger win is not running the frame. Vel is damage-tracked: an atomic frameDirty flag, raised by any Widget::markDirty(), gates whether the next frame does anything. When nothing's changed, the app sits in glfwWaitEventsTimeout and uses ~0 CPU — no layout, no paint, no spin. Animating widgets re-arm the flag from their tick(); a static page just sleeps.
So the steady state is: idle costs nothing, and when something does change, the relayout it triggers is ~2ms instead of 206ms. Both halves matter. A fast frame you run 60 times a second on an idle app is still a battery fire.
What it costs
- The string cache trades memory for time, bounded crudely. 200k entries is a fixed ceiling, not an LRU — it's a backstop against live-typing churn, not a tuned eviction policy. For pathological workloads (millions of unique strings) you'd want real eviction.
- It's still a full tree walk. Relayout re-visits every node; it's just that each visit is now cheap. The honest next step is dirty-subtree layout — skipping subtrees whose constraints and content are unchanged — so a 100k-row tree doesn't pay even the cheap per-node cost. Today the whole tree is re-walked; it's fast enough that I haven't needed to, which is its own kind of answer.
- It assumes advances are independent. The per-string cache works because today a string's width is the sum of its glyphs' advances. HarfBuzz shaping will break that assumption (kerning, ligatures, complex scripts) — the cache key still holds (same string, same width) but the per-glyph cache stops being sufficient on its own.
The meta-lesson is the one I keep relearning: measure before you optimize, because your intuition about where the time goes is usually wrong. I'd have happily spent a week making the flex math faster and moved the 206ms to 204ms. The profiler pointed at text measurement in thirty seconds, and a cache with no invalidation logic — the easy kind — bought two orders of magnitude.












