Dart Looked Like a Killer on the Backend. On a Leash, It's a Paper Tiger.

I first published this in Russian on Habr; this is my own translation, lightly reworked for an English audience.

TL;DR:

The Goal: Cut the 80 MB idle memory footprint of our Node.js microservices.

The Plan: Let Claude Code migrate an OAuth2 service to Dart, drawn in by AOT compilation and "cloud-ready" marketing.

The Reality: The migration itself was flawless and nearly effortless. The runtime was the problem: Dart's VM holds memory after peak loads by design (tuned for Flutter's 60 fps, not K8s): it returned 6% of its peak allocation versus Node's 81%, and posted the lowest throughput of any raw runtime I tested (~1.8× lower RPS than Node.js, a third of Go's per-core efficiency).

The Lesson: AI makes mechanical migration free. The cost of an unverified runtime hypothesis stays exactly the same, and the agent will never flag it for you. I ended up back on Go.

To be clear up front: this isn't a language flame. Dart the language is genuinely pleasant - clean, predictable, a joy to write. My problem is strictly the VM's runtime behavior under Kubernetes, not the syntax.

Here's the trap I walked into, and I think it's the defining trap of building with AI agents in 2026: when an agent can rewrite an entire service in a couple of hours, the writing of code stops being the expensive part. So you stop treating it as a decision. You skip the two-hour test that would have killed the idea, because skipping it no longer feels like it saves you anything: the code is already free.

It isn't free. The hypothesis underneath the code is exactly as expensive as it always was. This is the story of how an AI agent did everything I asked, perfectly, and that was precisely the problem.

If you're looking at Dart as a backend alternative to Node.js, learn from my mistake instead. Complete benchmark results (Go, Node.js, Dart, Bun, Deno, and .NET), with methodology, configs, and raw numbers, are on GitHub.

Disclaimer: The biggest mistake wasn't choosing Dart: it was the order of steps. Instead of running a raw benchmark on a production-like scenario on day one, I blindly trusted AOT compilation, static typing, and "ready for cloud" marketing. This isn't really about Dart; it's an expensive lesson about validating a runtime hypothesis before building architecture on top of it. To avoid micro-optimization flame wars, all manifests, CPU profiles, and source are isolated in the repository. I'm a regular JS/TS developer, not a Go or .NET guru; if you spot a flaw, PRs are welcome.

Act I: The Marketing Honeymoon

I work on a SaaS product powered by Node.js. For the most part, it does the job well: solid performance, fast development cycles, and a unified language across the stack. Sure, under heavy load certain services get greedy. Our OAuth2 service, for instance, would peak at 500 MB of RAM. But tokens were issued smoothly, memory was eventually reclaimed, and performance degradation was strictly bottlenecked by CPU-heavy cryptography.

However, when you run a SaaS with hundreds of identical microservices, an idle footprint of 80 MB per instance scales into a noticeable infrastructure bill.

We started exploring alternatives. Go is the obvious candidate, but our team is JavaScript to the bone. Go didn't spark much enthusiasm: the endless err != nil boilerplate on every line, and passing context.Context as the first argument felt like a relic, a vivid reminder of Node's old callback hell where err was always the first argument.

Then we stumbled onto Dart. On paper it ticked every box: AOT compilation, static typing, familiar syntax. The official dart.dev docs explicitly state: "Creating scalable, high performance APIs and event-driven apps are good use cases for Cloud Run." Straight from the source, so I bought in.

You know that honeymoon phase with a new technology? The landing page is pristine, the syntax is elegant, and you think: "This is the silver bullet! Why is everyone else still suffering with legacy tools?" You dive in headfirst.

We knew we wouldn't hit Go-level performance. But dropping from 80 MB to 10 MB idle? Our AI assistant, having scraped the same marketing blogs, promised exactly that. To be fair, a barebones HTTP server compiled to AOT genuinely consumed peanuts. I ran it, verified it, got inspired. Static typing, AOT, an 8× memory reduction, and wiping out the black hole of node_modules: it sounded flawless.

Keyword: sounded.

Act II: The Reality of the Codebase

Problem 1: The Backend Ecosystem Ghost Town

To test the waters we decided to rewrite our auth service. The Node.js version relied on NestJS, a custom Redis ORM, and ts-oauth2-server. Backed by Claude Code, porting a clean TS codebase with structured architecture seemed trivial. Move the structure, translate the logic, swap the runtime. What could go wrong?

Turns out Dart is Flutter's world. Outside of Flutter, the backend ecosystem is a ghost town. There's no NestJS equivalent, Redis clients can be counted on one hand, and something resembling ioredis simply doesn't exist. Sure, there are Shelf and Serverpod. But we didn't need a monolithic full-stack framework designed to bundle mobile apps (Serverpod), nor did we want to micro-manage routing like it's Express.js in 2015 (Shelf). We needed an enterprise-grade backend architecture.

I'd wanted to build a lightweight alternative to NestJS to bypass its classic pain points: bulky dependency trees, excessive runtime reflection magic. The thought of "since Dart lacks NestJS, I'll just build my own perfect mini-framework" felt incredibly empowering at the time. I was already imagining the GitHub stars. Combined with an uncapped AI agent token budget and a habit of reinventing wheels, the plan felt as precise as a Swiss watch.

Problem 2: Language Constraints & Code Generation Hell

I remember the early-2010s hype when Google pushed Dart as a JavaScript replacement. After writing actual production code in it, I finally understood why that push failed.

Want to enumerate an arbitrary object's fields dynamically, the way you'd casually for...in over a plain JS object? Without reflection, forget it. Want to invoke a static method dynamically via a reference to a Type? No chance. The Type primitive is severely limited, and invoking anything on it in AOT mode is blocked by design. Runtime reflection is stripped out for AOT. So how do you build a proper DI container without reflection?

Okay, I thought: time to accept the Dart gospel and stop writing TypeScript in Dart. I looked at how the ecosystem handles this and hit the ultimate developer nightmare: heavy code generation.

Dart's approach is highly intrusive: every file needs a manual part 'file.g.dart' declaration. Your clean source file is tightly coupled to a file that doesn't exist yet. You're left with two options: commit thousands of lines of auto-generated clutter into your repo, or run sluggish build runners on every single pipeline stage.

To minimize the mess I used an experimental flag (--enable-experiment=enhanced-parts) and wrote a custom CLI tool. The workflow: spin up the entry point in JIT mode, introspect annotations and types via dart:mirrors, convert ClassMirror to descriptors, trigger annotations, generate the files needed for AOT. Something that takes three lines of reflection in C#, Java, or TypeScript required building an entire secondary toolchain in Dart.

Paradoxically, the core language is pleasant. Writing Dart feels smooth, logical, predictable; most errors are caught early at compile time. Claude Code ported ioredis in a few hours: 4,500 lines of Dart against 23,500 lines of the original TypeScript. The syntax is clean enough that even an AI writes it elegantly. But the moment you venture beyond basic business logic, you hit a wall. And because of that friction, nobody builds tooling, which leaves the ecosystem stagnant.

Over two weeks we engineered a NestJS-like framework with hierarchical DI (similar to Angular), transport-layer isolation, request-scoping via Zone, and a code-gen CLI. Claude flawlessly ported our Redis ORM. On the surface, everything looked spectacular. Yet an unsettling feeling remained. On day 14, I finally ran a clean, isolated load test.

Act III: The Production Reality Check

Problem 3: Performance Under Throttling

I set up a straightforward benchmark: three endpoints, Postgres, Redis, identical resource constraints for every runtime. No frameworks, no ORMs, just raw HTTP servers, to test the raw runtimes, not the frameworks.

I expected Dart to land slower than Go but comfortably ahead of Node.js. I named the benchmark repo go-vs-dart. The numbers quickly proved that was the wrong title.

At rest, Dart looked unbeatable: instant boot, 3 Mi RSS versus Node's 18 Mi, a tiny 5 MB Docker image. Then traffic hit, and the illusion collapsed. Here's the head-to-head at 500 VUS, shown at both the generous profile (1000m = a full dedicated core) and the throttled one (100m), median across 3 runs, 256 Mi memory cap:

Runtime	RSS idle	RSS peak	Returned	p95 @1000m	p95 @100m
Go	1 Mi	29 Mi	75%*	0.4 s	2.9 s
Node.js	18 Mi	39 Mi	81%	0.5 s	5.0 s
Dart	3 Mi	39 Mi	6%	0.9 s	9.5 s

Returned = share of above-idle memory released back to the OS within 5 min after the run. * Go: scavenger releases gradually (29→24→16→8 Mi), trending to ~1 Mi given longer. Full field (Bun, Deno, .NET, NestJS, Rust) in the repo.

Three findings stand out, and not one of them improves when you give Dart more CPU.

Throughput. At a full core, Dart handles 741 RPS, compared to Go's 1,656 and Node's 1,321. Throttle all three down to a tenth of a core and the order doesn't move: Dart 67, Go 209, Node 128. The clearest signal is normalized throughput (RPS per 100m of CPU), which barely shifts between profiles: Dart sits at ~74 RPS per 100m of core no matter what, a third of Go's 256 and about half of Node's 135. So this isn't a throttling artifact. Hand Dart a whole idle core and it's still last; the inefficiency is in the runtime, not the CPU budget.

Memory. This is the real dealbreaker. After the load test, Go hands 75% of its peak back to the OS and Node 81%; Dart returns 6%. It peaks at 39 Mi and is still holding 37 Mi five minutes later. And it gets worse under pressure: throttle the CPU and Dart's peak climbs to 47–48 Mi, exactly backwards for a runtime you're squeezing to save money.

Latency. At a full core, everyone's p95 is sub-second. Drop to 100m and Dart's blows out to 9.5 s: not "slow," but cascading timeouts and an immediate outage. Node and Go degrade too (5.0 s and 2.9 s), but stay survivable. That same 100m profile, for what it's worth, is where Bun-native and NestJS get killed outright by the liveness probe; Go, Node, and Dart at least stay up.

I tried every GC flag I could find: --dontneed_on_sweep, --use_compactor, --force_evacuation, --mark_when_idle, --old-gen-heap-size. At some point even the Dockerfile had started to read like a ritual of hope rather than a configuration. Nothing moved the needle. Eventually the Dart VM team confirmed it: this is by design.

The Dart VM is carefully tuned for Flutter. Its priority is minimizing GC pause spikes to guarantee seamless 60 fps UI rendering on mobile. Returning RSS to the host OS is an afterthought. Brilliant for client-side mobile; catastrophic for Kubernetes containers.

Kubernetes has no idea your process is just "caching memory for later." It reads active RSS. If an HPA scales your service to 10 pods during a surge, those Dart pods retain peak memory long after traffic subsides. The K8s scheduler can't bin-pack effectively, and cluster autoscalers can't downscale nodes. Node.js pods shrink back down and free cluster resources; Dart makes you pay for peak usage indefinitely.

And this is what gets called "ready for cloud," on a runtime whose own docs pitch it for "scalable, high performance APIs." Run it on Cloud Run (long-lived container instances, not ephemeral functions), where you pay for memory × time. A burst of 100 lands and memory jumps; Node spikes and gives it back, Dart grabs it and keeps it. A burst of 200 follows: Node climbs from idle again and settles back, Dart climbs from a floor it never lowered. Node lets you size the instance limit tight and trust it to fit; Dart makes you provision for the all-time peak, forever. So Dart on Cloud Run literally costs more than Node.js: slower to serve each request, and sitting on memory long after the peak. Then again, Google knows best what "high performance" means.

One last indicator of ecosystem neglect: our ioredis port, mechanically spit out by an AI agent, posted 5% higher RPS than the most popular, community-vetted Redis client on pub.dev. That's not praise for the AI; it's an indictment of the ecosystem. Nobody is building or optimizing heavy server infrastructure in Dart.

The marketing facade had completely crumbled.

Act IV: A Sober Retrospective

After two weeks of heavy lifting (authoring a DI engine, porting Redis clients, running load profiles), the verdict is absolute: for backend cloud workloads, Dart is a paper tiger. Its true home is primarily Flutter. The team traded reflection for small mobile binaries, stalled on macro language features for years, and tuned the VM exclusively for client-side interaction.

It's a shame, because Dart was one decision short of being a serious contender, and potentially a genuine Node.js killer for cloud-native. It starts at 3 Mi, with a memory peak no worse than Node's. If it just handed that peak back within a reasonable window after load dropped, the lifecycle would already be beautiful: an HPA brings up a second pod in a second, traffic spreads across both, load falls, memory frees, Kubernetes packs the nodes tighter. You'd fit two lean Dart pods in the memory one 40 Mi Node pod occupies, and two Dart pods comfortably out-serve that single Node pod, and the per-pod RPS gap would no longer look like a death sentence. Instead, the pod grabbed 40 Mi and never gave it back. One architectural call in the GC, made for 60 fps, and the whole scenario falls apart. Had memory behaved predictably, a backend use case would have emerged, a community would have formed behind it, the I/O core would have caught up, and Dart could have been a real Node.js replacement for cloud-native.

In the end, my search for a silver bullet led me right back to Go. Yes, it can feel repetitive, verbose, and demanding with its explicit err != nil handling. But in a real production environment (throttled by CPU ceilings, juggling database network hops, needing nimble memory management), Go simply scales. It wasn't the raw-RPS champion (Bun's native APIs edged it out), but it had the best CPU efficiency in the field by a wide margin (~256 RPS per 100m of core versus Node's 135 and Dart's 74), at the smallest peak footprint (29 Mi); it never died under any throttling profile, and its scavenger unwinds memory back toward ~1 Mi idle. Gradually but fully, which is exactly what Dart refuses to do.

(Side note: Bun native actually posted the highest raw RPS of the whole field (edging out Go), but at ~3× Go's peak memory (85 Mi vs 29 Mi), and under severe 100m CPU throttling it got killed outright by the liveness probe. For hardened K8s setups, not production-ready yet. And there's a deeper catch I'll save for its own post: on a large codebase, JSC's resident JIT machine code and allocator regions inflate RSS no matter how clean your heap is, which makes Bun a brilliant toolchain, but not a runtime I'd hand a heavy backend in production. (Native vs npm vs Node numbers here.))

So I'm opening the Go tour and starting from scratch.

The Real Lesson

Commenters will rightly point out: "A two-hour k6 script on day one would have saved you two weeks." They're completely correct.

But there's a subtle trap I only understood at the end. Tools like Claude Code make mechanical migration incredibly cheap, essentially free. That's the paradox. When the cost of writing code drops to zero, it becomes dangerously easy to forget that the cost of an unverified runtime hypothesis stays identical. An AI agent won't challenge you and ask: "Are you sure this execution engine fits your infrastructure model?" It will simply build what you asked for, beautifully, swiftly, without hesitation.

Validate the runtime hypothesis first. Then let the AI raise the walls.

The benchmark is open and built to be extended: same workload, same Postgres and Redis, same K8s manifests, three CPU profiles. Adding a runtime is one service directory plus one manifest, and the harness does the rest. Benchmarks, configs, and raw numbers: github.com/klerick/go-vs-dart - PRs and new runtimes welcome.

I'd genuinely love to see PHP 8.x in there. My last serious PHP was 5.1, and people keep telling me the modern version is a different animal — so if that's you, send a PR and let the numbers settle it. Same for any runtime you think I treated unfairly: the rules are in the README, and the harness doesn't play favorites.

P.S. This was supposed to be a quick post on preliminary findings. Instead, validating the numbers dragged me down a benchmark rabbit hole for another two weeks: warming up containers, tuning throttling quotas, evaluating .NET, Bun, and Deno.

Validate your hypothesis. Validate your benchmarks. Then write the article. One month of work instead of a two-hour test.