Deep Dive into GBase 8a MPP Distributed Query Execution

How does a SQL statement travel through a GBase 8a cluster — from parsing and plan generation to parallel execution and final aggregation? This article explains the complete execution path, the roles of coordinator and data nodes, and common performance pitfalls in a gbase database.

1. Architecture Recap: Three Roles

GBase 8a MPP Cluster consists of three core process types:

Process	Node Type	Primary Responsibility
gcluster	Coordinator	SQL parsing, plan generation, task distribution, result assembly
gnode	Data Node	Data storage, local scan, partial aggregation, Hash Join
gcware	Cluster Manager	Heartbeat, replica consistency arbitration, failover

Clients communicate only with gcluster. gcluster holds metadata (table definitions, distribution info, replica topology) but stores no user data.

2. Full Lifecycle of a Query

Consider this typical analytical query:

SELECT dept_id, SUM(sale_amount) AS total
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY dept_id
ORDER BY total DESC
LIMIT 100;

Stage 1: Parsing and Semantic Checks (gcluster)

The SQL Parser in gcluster converts the text into an AST and performs semantic validation — verifying that tables and columns exist and that data types are compatible.

Stage 2: Query Plan Generation (gcluster)

The optimizer generates a Distributed Query Plan (DQP) based on metadata. Two core decisions are made:

Pushdown vs. aggregation: Filter conditions like WHERE order_date >= '2024-01-01' are pushed down to each gnode to avoid transferring full datasets. Because dept_id is unlikely to be the distribution key, aggregation requires each gnode to first perform partial aggregation, then redistribute the partial results by dept_id hash before doing final aggregation.
Data redistribution strategy:
- Hash Redistribute: Triggered when the JOIN/GROUP BY column is not the distribution key. Cost: network transfer + shuffle.
- Broadcast: Small tables can be broadcast to all nodes instead of being redistributed.
- No redistribution: Optimal — when the JOIN/GROUP BY column happens to be the distribution key.

Key parameters: gcluster_hash_redistribute_join_optimize and gcluster_hash_redistribute_groupby_optimize control whether small tables are broadcast to avoid unnecessary hash shuffles.

Stage 3: Task Distribution and Parallel Execution (gcluster → gnode)

gcluster splits the DQP into multiple fragments and sends them concurrently to all participating gnodes over internal TCP channels. Each gnode then uses worker threads (controlled by gbase_parallel_degree) to scan its local data segments in parallel.

gcluster
  ├─ Fragment-1 → gnode1 (local scan + partial aggregation)
  ├─ Fragment-1 → gnode2 (local scan + partial aggregation)
  └─ Fragment-1 → gnode3 (local scan + partial aggregation)
         ↓
  [Hash Redistribute by dept_id]
         ↓
  ├─ Fragment-2 → gnode1 (final aggregation)
  ├─ Fragment-2 → gnode2
  └─ Fragment-2 → gnode3
         ↓
  gcluster merges TOP 100

Stage 4: Final Merge and Return to Client (gcluster)

Each gnode streams its fragment result back to gcluster. For ORDER BY ... LIMIT 100, gcluster performs a final merge‑sort to pick the top‑N rows and returns them to the client.

3. Intermediate Tables and Debugging

For complex queries, gnodes create internal temporary tables that are automatically dropped after execution. To keep them for troubleshooting:

SET gcluster_executor_debug = 1;

⚠️ Debug only — never leave this on in production, or intermediate tables will fill the disk.

To see currently executing queries and per‑node timings:

SHOW FULL PROCESSLIST;

-- Requires prior configuration (gcluster_dql_statistic_threshold in milliseconds)
SELECT * FROM gclusterdb.dql_statistic ORDER BY exec_time DESC LIMIT 20;

4. Common Query Performance Pitfalls

Pitfall 1: Cartesian Product Causing Disk Spikes

When a JOIN condition is missing, two large tables produce a Cartesian product that can reach terabytes. Cap intermediate row counts:

# gnode gbase.cnf
_gbase_result_threshold = 1000000000  -- error if >1 billion rows

Pitfall 2: Data Skew Turning One Node into a Bottleneck

GROUP BY on a low‑cardinality column concentrates all data on a few nodes after hash redistribution. Solutions:

Choose a high‑cardinality distribution key
Enable multi‑column hash redistribution for skewed GROUP BYs:

SET _t_gcluster_distinct_multi_redist = 1;
SET _t_gcluster_hash_redistribute_groupby_on_multiple_expression = 1;

Pitfall 3: Small Tables Treated as Distribution Tables During JOINs

The optimizer may hash‑redistribute many small tables, generating excessive network traffic. Build frequently used small tables as replicated tables:

CREATE TABLE dim_region (
    region_id INT,
    region_name VARCHAR(64)
) REPLICATED;

5. Summary

Phase	Process	Key Actions
Parse & Optimize	gcluster	AST creation, DQP planning, redistribution strategy
Local Execution	gnode	Data scan, partial aggregation, Hash Join
Data Shuffle	gnode ↔ gnode	Hash Redistribute / Broadcast
Final Merge	gcluster	Merge‑sort, Top‑N, return to client

Understanding this pipeline is the key to pinpointing bottlenecks in a gbase database: is the redistribution too expensive? Is one gnode scanning too slowly? Or has gcluster become the single‑point merge bottleneck? Use EXPLAIN and dql_statistic system tables for precise diagnosis.