Skip to Content
Course content

Project 1: Deconstructing the Query Planner and Execution Pipeline

Project 1: Deconstructing the Query Planner and Execution Pipeline

You are provided with a 10GB dataset simulating an e-commerce platform (schema and data generation script provided). Your task is to analyze a highly complex, deliberately slow analytical query that joins multiple massive tables.

Using EXPLAIN (ANALYZE, BUFFERS), you must map out the execution tree and identify the bottlenecks. You will write a technical report detailing exactly why the PostgreSQL planner chose its specific access paths and join strategies (e.g., why it chose a Nested Loop over a Hash Join) by querying the pg_class and pg_statistic system catalogs.

Finally, you must optimize the query. You are required to implement at least three distinct modifications to achieve a minimum 50% reduction in execution time. These modifications can include creating specialized indexes, altering session-level memory parameters (like work_mem), or adjusting planner cost constants (like random_page_cost). You must provide the "Before" and "After" execution plans and justify your changes.

Rubric:
CriteriaExcellentProficientNeeds Improvement
Execution Plan AnalysisFlawlessly identifies the most expensive nodes; accurately calculates buffer hits/reads and memory usage.Identifies main bottlenecks but misses nuanced buffer or memory details.Fails to identify the true bottleneck; misinterprets the EXPLAIN output.
Statistical JustificationAccurately links planner choices to specific data in pg_statistic (e.g., MCVs, histograms) and system cost constants.Mentions statistics but does not pull specific catalog data to prove the planner's logic.Provides no correlation between system catalogs and the execution plan.
Optimization ImplementationAchieves >50% speedup using three distinct, well-reasoned tuning techniques.Achieves speedup but relies entirely on basic B-tree indexes; ignores memory or cost parameters.Fails to achieve significant speedup or uses brute-force methods without understanding.
Technical ReportingExplanations are highly technical, clearly written, and correctly use PostgreSQL terminology (e.g., sequential scans, materialize nodes).Explanations are adequate but occasionally use imprecise terminology.Explanations are vague, lacking depth, or fundamentally misunderstand the executor pipeline.
Rating
0 0

There are no comments for now.

to be the first to leave a comment.