Polars 1.0 Lazy Evaluation vs DuckDB 1.0 SQL on Parquet: Query Offload Explained
When working with large Parquet datasets, two tools dominate modern analytics stacks: Polars 1.0 with its lazy evaluation engine, and DuckDB 1.0, the embedded OLAP database that runs SQL directly on Parquet files. A key question for engineers is: which queries offload processing to the respective engine, and how does that impact performance?
What Is Query Offload?
Query offload refers to pushing computation (filtering, projection, aggregation) down to the storage or engine layer, rather than loading full datasets into application memory. For Parquet, this means the engine reads only relevant row groups, columns, and statistics to execute queries without scanning entire files.
Polars 1.0 Lazy Evaluation: How Offload Works
Polars 1.0’s lazy API builds a logical query plan first, then optimizes it before execution. When reading Parquet via pl.scan_parquet(), Polars:
- Reads Parquet file metadata (schema, row group stats, column chunk offsets) without loading data
- Prunes row groups that don’t match filter predicates using min/max statistics
- Selects only required columns (projection pushdown) to avoid reading unnecessary data
- Offloads aggregations, joins, and filters to its multi-threaded Rust engine when possible
Queries that offload fully in Polars include: column selection, row group-prunable filters, simple aggregations (sum, count, mean) on scanned Parquet, and joins between lazy Parquet scans. Queries that require full data materialization (e.g., user-defined Python functions, complex cross-language operations) break offload and load data into memory.
DuckDB 1.0 SQL on Parquet: Offload Mechanics
DuckDB 1.0 treats Parquet files as first-class tables via its Parquet reader. When running SQL queries like SELECT col1, SUM(col2) FROM 'data.parquet' WHERE col3 > 100 GROUP BY col1, DuckDB:
- Uses Parquet metadata to prune row groups and columns before scanning
- Offloads all standard SQL operations (filters, aggregations, joins, window functions) to its vectorized execution engine
- Supports predicate pushdown, projection pushdown, and aggregation pushdown to Parquet reads
- Can offload joins between multiple Parquet files to the engine without materializing intermediate results
Almost all valid SQL queries on Parquet offload to DuckDB’s engine, including complex window functions, CTEs, and subqueries. The only exceptions are queries that call external UDFs or require returning full result sets to the client for post-processing.
Key Differences in Offload Coverage
Query Type
Polars 1.0 Lazy Offload
DuckDB 1.0 SQL Offload
Column selection (projection)
Full offload
Full offload
Row group pruning via filters
Full offload
Full offload
Simple aggregations (sum, count)
Full offload
Full offload
Complex window functions
Partial offload (may materialize)
Full offload
Joins between Parquet files
Full offload (lazy scans only)
Full offload
Python UDFs in query logic
No offload (loads data)
No offload (unless DuckDB UDF)
When to Choose Which?
Use Polars 1.0 lazy evaluation if you’re already working in Python, need tight integration with Pandas/NumPy ecosystems, or have custom logic that fits Polars’ expression API. It offloads most standard analytics queries efficiently.
Use DuckDB 1.0 SQL if you prefer declarative SQL, need to support ad-hoc queries from multiple tools, or run complex window functions and CTEs that DuckDB offloads fully. Its SQL interface lowers the barrier for non-Python users.
Conclusion
Both Polars 1.0 and DuckDB 1.0 offload most Parquet queries to their engines, but DuckDB covers a wider range of SQL-native operations, while Polars excels in Python-centric workflows with lazy optimization. Test both with your specific workload to measure offload efficiency and performance.


