Compiler Pipeline Performance Characterization: Lexing, Parsing, Type-Checking, Bytecode Compilation, and VM Execution
Tags: compiler, performance, research, opensource
Summary
This dataset measures the compilation pipeline of the Kasteran programming language compiler. We provide per-stage timing breakdowns (lexing, parsing, HIR lowering, type-checking, bytecode compilation) across program sizes from 10 to 1000 lines, plus bytecode virtual machine execution timing for language constructs including function calls, branching, pattern matching, pipe chains, scatter operations, and closures.
Methodology
All measurements on Intel i7-1260P. Each data point: median of 100 runs. Compiler built with Rust 1.96.0 at opt-level=3.
Key Results
- Total compilation time scales linearly with source size: 0.27ms for 10 lines to 21.8ms for 1000 lines
- Type-checking dominates compile time (~42% of total)
- Bytecode compilation is consistently ~32% of total time
- VM execution: function call 0.42us, pipe chain 0.25us, scatter 1.80us
- Compilation rate: ~45,000 lines/sec at 1000-line programs
Data Files
-
stat_compilation_pipeline.csv: Per-stage timing by program size -
stat_vm_execution.csv: VM instruction timing by construct type -
raw_compilation_pipeline.csv: Full raw measurements -
raw_vm_execution.csv: Raw VM timing data
Dataset: https://doi.org/10.7910/DVN/KFK12Y












