Coroutines vs Threading vs Multiprocessing: 10k Requests

The Benchmark That Changed How I Structure Python Backends

I tested three concurrency models—asyncio coroutines, threading with ThreadPoolExecutor, and multiprocessing with Pool—by hammering each with 10,000 HTTP requests to a local Flask server. The results weren't what I expected: coroutines finished in 2.1 seconds using 45 MB of RAM, threading took 3.8 seconds with 340 MB, and multiprocessing clocked 12.7 seconds while consuming 1.2 GB.

But raw speed doesn't tell the whole story.

The moment I added CPU-bound JSON parsing (simulating real-world API response processing), everything flipped. Multiprocessing suddenly became competitive for certain workloads, and threading revealed edge cases where it silently degrades under the GIL. This post breaks down when each model actually wins, backed by profiling data and memory snapshots you can reproduce.