I recently built a sorting playground, and one question kept coming up:
How do you compare and evaluate sorting algorithms?
Not just theoretically, but in practice.
The problem
A simple benchmark sounds easy:
- run the same algorithm
- measure time
- compare
But it quickly becomes complicated:
- Python vs Rust vs C behave very differently
- large inputs can break CI pipelines
- some algorithms are not even benchmarkable
The approach
Instead of chasing perfect accuracy, I focused on something else:
consistency and reproducibility
Key decisions
- run benchmarks in CI (GitHub Actions)
- use fixed datasets
- run multiple iterations
- average results
Input sizes
- small
- medium
- large
But not every language runs all sizes (due to runtime issues).
The reality of cross-language benchmarking
One important thing I learned:
Not all languages should run the same workload.
For example:
- Rust / C / C++ can handle large datasets easily
- Python can become extremely slow on large inputs
- running everything “fairly” is not practical
Practical constraints
So I introduced constraints:
- Python skips large datasets
- heavy algorithms are limited
- some algorithms opt out entirely
This makes the system:
- fast enough to run in CI
- stable
- still useful for comparison
Incremental benchmarking
Another key idea:
don’t re-run everything
- new algorithms → benchmarked
- existing ones → reused
This keeps CI time under control.
What the system produces
Each algorithm gets:
- per-language results
- per-size measurements
- aggregated data
The output is stored as static JSON and rendered in the UI.
Why build this?
Because combining:
- visualization
- comparison
- and reproducible benchmarks
makes algorithms much easier to understand.
Future ideas
- more languages
- better scoring models
- workload-specific comparisons
But always keeping it simple enough to run.
If you’re interested, you can explore it here (Open sourced under the MIT license):
https://sorting.1234567890.dev/benchmark
https://github.com/T-1234567890/sort-playground











