Originally published on rohitraj.tech
Sina Weibo dropped VibeThinker-3B this week — a 3-billion-parameter, MIT-licensed reasoning model that matches DeepSeek V3.2 (671B) on AIME 2026 (94.3 vs 94.2) and runs from a ~6 GB file on a laptop. The catch the headlines skip: it ties on AIME but trails on harder math (HMMT 89.3 vs 90.2, IMO-AnswerBench 76.4 vs 78.3), which is exactly why the AI world is arguing about benchmarks again. This is the builder read — what actually shipped, the Spectrum-to-Signal training trick behind it, the vLLM and Ollama commands to run it (including the temperature setting that breaks it if you get it wrong), an honest comparison table, where a tiny verifiable-reasoning model is worth wiring into an agent, and where it absolutely is not.
Read the full version with code samples, diagrams, and architecture details: VibeThinker-3B: A 3B Reasoning Model That Rivals 671B Giants (2026)
More engineering notes: rohitraj.tech/en/notes

