This article was originally published on runaihome.com
TL;DR: Computex 2026 announced a wave of "agentic AI PCs" — NVIDIA RTX Spark laptops (128GB, fall 2026), AMD Ryzen AI Max 400 desktops, and a 40-TOPS NPU floor baked into Windows 11. For home lab builders, almost none of it beats the math you already had: a used RTX 3090 still generates tokens faster than any of these unified-memory boxes for models that fit in 24GB, and the NPU numbers are marketing TOPS, not tokens per second. The new hardware wins exactly one fight — running 70B+ models that don't fit on a single consumer GPU.
| RTX Spark N1X laptop | Ryzen AI Max 400 / Strix Halo | Used RTX 3090 tower | |
|---|---|---|---|
| Best for | CUDA + large context, portable | 70B+ models in unified memory | Fastest tokens/sec under 24GB |
| Memory / bandwidth | 128GB / ~300 GB/s | 128GB / 256 GB/s (~215 real) | 24GB / 936 GB/s |
| Price | $2,899+ (fall 2026) | $1,499–$1,999 (ships now) | ~$1,050–$1,210 (used) |
| The catch | Doesn't ship until fall | ~5 tok/s on 70B; bandwidth-bound | 24GB ceiling, no big MoE |
Honest take: If you already have a used RTX 3090, Computex gave you no reason to upgrade. If you specifically need 70B+ models in memory and don't want a multi-GPU tower, a Strix Halo box does that today for less than the RTX Spark will cost this fall.
What Computex 2026 actually announced
Computex 2026 set a 45-year attendance record, and the framing across the show floor was identical from every vendor: the "agentic PC era." The pitch is that your next machine runs AI agents locally, all day, without a cloud round-trip. Strip away the keynote language and three concrete things landed that matter for local AI:
NVIDIA RTX Spark. A new system-on-chip for Windows-on-Arm laptops and compact desktops. The top tier pairs 20 Arm CPU cores with a Blackwell GPU (6,144 CUDA cores) and up to 128GB of LPDDR5X at roughly 300 GB/s of bandwidth. Partner systems from ASUS, Dell, HP, Lenovo, Microsoft, and MSI ship in fall 2026, with wider availability slipping into early 2027. We covered the full breakdown in our NVIDIA RTX Spark deep dive — the short version is that the N1X tier is the only one with enough memory to matter, and it starts above $2,899.
AMD Ryzen AI Max 400. AMD's answer to both Apple Silicon and RTX Spark: x86 APUs with up to 128GB of unified memory, the successor generation to the current Strix Halo (Ryzen AI Max+ 395). HP also showed an updated Z2 Mini G1a workstation built on the Ryzen AI PRO 400 series. AMD separately confirmed the AM5 socket lives through at least 2029.
A 40-TOPS NPU floor. Microsoft tied a July 2026 Windows 11 update (build 26200.1) to a hard requirement: an NPU capable of 40 TOPS or more. That's the line that turns a regular laptop into an officially badged "Copilot+/AI PC," and it's why Qualcomm's Snapdragon C series, Intel's NPU-equipped chips, and the RTX Spark platform all got stage time.
The unifying story is clear. The unifying benefit for someone running Ollama or ComfyUI at home is much murkier.
The number that deflates the NPU hype
The 40-TOPS requirement makes NPUs sound like the new center of gravity. They aren't — not for large language models. TOPS measures raw integer throughput, and LLM inference is almost never compute-bound. It's memory-bound: token generation spends most of its time moving model weights from memory into the compute units, so memory bandwidth — not TOPS — sets the speed.
The real-world throughput shows this brutally. Intel's Lunar Lake NPU lands around 18–20 tokens/second on LLM tasks, and an 8B model at Q4 runs roughly 15–25 tok/s overall on these AI-PC NPUs. Qualcomm's Snapdragon X Elite Hexagon NPU advertises 45 TOPS, but real throughput tracks bandwidth, not that headline figure. NPUs do deliver a genuine win — roughly 40–45% lower power on the AI tasks they're built for — which is why they're great for background features like webcam effects, live captions, and short on-device summarization. They are not where you run a 32B coding model.
For context, comfortable reading speed is about 5–8 tok/s and anything above ~15 tok/s feels real-time. An NPU laptop can clear that bar for a small model. So can a five-year-old GPU, faster, and for less money.
Where the unified-memory boxes actually help
The honest case for the Computex hardware is capacity, not speed. A used RTX 3090 has 24GB of VRAM. That's a hard wall. A Q4_K_M Llama 3.3 70B needs ~41GB, and the large MoE models everyone wants to try in 2026 — GPT-OSS 120B, Qwen3-235B variants — don't fit on any single consumer card.
This is the one place the 128GB unified-memory machines earn their keep. The current Ryzen AI Max+ 395 (Strix Halo) runs GPT-OSS 120B at 55 tok/s and Qwen3-30B at 100 tok/s entirely in unified memory, in a $1,499–$1,999 mini PC. The catch is bandwidth: 256 GB/s on paper, ~215 GB/s measured, against the RTX 3090's 936 GB/s. So on a 70B dense model the same machine drops to roughly 5 tok/s — usable for a single-user chat where you're reading along, painful for anything agentic that loops.
NVIDIA's own preview hardware tells the same bandwidth story. The DGX Spark (GB10, the desktop sibling of the RTX Spark platform, $4,699) has 128GB but only 273 GB/s. It runs Llama 3.1 70B FP8 at 803 tokens/sec prefill but just 2.7 tokens/sec decode — the decode number is the one you feel when you're waiting for output. Qwen 2.5 72B holds around 4.6 tok/s. Those are large-model-fits-in-memory numbers, not fast numbers.
The comparison that actually matters
Put the three approaches against the work a home lab actually does:
| Workload | Used RTX 3090 (24GB, 936 GB/s) | Strix Halo / Ryzen AI Max (128GB, ~215 GB/s) | 40-TOPS NPU laptop |
|---|---|---|---|
| 8B model, Q4 | ~80–90+ tok/s | ~34–38 tok/s | ~15–25 tok/s |
| 30B MoE (e.g. Qwen3-30B) | Fits, fast | ~100 tok/s | Won't fit comfortably |
| 70B dense, Q4 | Doesn't fit (needs offload) | ~5 tok/s | No |
| 120B MoE | No | ~55 tok/s | No |
| Power draw | ~285W under load | 45–120W class | 15–45W class |
The pattern: bandwidth wins for anything that fits in 24GB, and capacity wins only past that line. There is no single Computex announcement that beats a discrete GPU on speed and on the models a discrete GPU can't hold. You pick your constraint.
For a daily 7B–14B coding assistant — the most common home lab workload — the RTX 3090 isn't just faster, it's several times faster, because the entire model lives in high-bandwidth VRAM. An RTX 5090 widens that gap further (1,792 GB/s, ~186 tok/s on Qwen3 8B Q4), which is why our GPU buying guide still anchors on discrete cards for most builders.
A real gotcha: Windows-on-Arm and the RTX Spark
The RTX Spark laptops are exciting partly because they bring a full CUDA stack to a portable Windows machine. But they run Windows on Arm, and that introduces a tax most home lab tutorials gloss over. A lot of the local-AI tooling ecosystem ships x86-64 binaries: some llama.cpp build variants, certain ComfyUI custom nodes with compiled dependencies, and a long tail of Python wheels that don't have Arm64 builds yet. Expect to hit ImportError or Illegal instruction on packages that assume x86, and to fall back to emulation (slower) or to wait for native Arm64 wheels. AMD's Ryzen AI Max line avoids this entirely — it's x86, so your existing Linux/Windows toolchain just works. That software-compatibility difference is a real reason to favor Strix Halo today over waiting for RTX Spark, beyond the price and ship-date gap.
If you don't want to buy any of it
The whole "agentic PC" pitch assumes you're buying hardware to run agents 24/7. If your actual need is a few hours of heavy inference a week — fine-tuning a model, batch-processing a dataset, testing a 120B model once — r



