-
Resources
-
Chapter 1: The Anatomy of PostgreSQL: Architecture and Process Model
-
- Join this Course to access resources
- Join this Course to access resources
- Join this Course to access resources
-
-
Chapter 2: The Journey of a Query: Lexing, Parsing, and the Traffic Cop
-
Chapter 3: The PostgreSQL Rule System and Query Rewriting
-
- Join this Course to access resources
-
-
Chapter 4: The Query Planner Part I: Statistics and Cost Estimation
-
- Join this Course to access resources
-
-
Chapter 5: The Query Planner Part II: Path Generation and GEQO
-
- Join this Course to access resources
-
-
Chapter 6: The Executor: Processing the Plan Tree
-
- Join this Course to access resources
-
-
Chapter 7: Advanced Indexing Under the Hood
-
- Join this Course to access resources
-
-
Chapter 8: Multiversion Concurrency Control (MVCC) and Vacuuming
-
- Join this Course to access resources
-
-
Chapter 9: Memory Management and Caching Strategies
-
- Join this Course to access resources
-
-
Chapter 10: The Write-Ahead Log (WAL) and Crash Recovery
-
- Join this Course to access resources
-
-
Chapter 11: Replication: Physical and Logical
-
- Join this Course to access resources
-
-
Chapter 12: Distributed PostgreSQL and Sharding
-
- Join this Course to access resources
-
-
Chapter 13: Extending the Engine
-
- Join this Course to access resources
-
-
Project Based Assignments
Project 1: Deconstructing the Query Planner and Execution Pipeline
Project 1: Deconstructing the Query Planner and Execution Pipeline
You are provided with a 10GB dataset simulating an e-commerce platform (schema and data generation script provided). Your task is to analyze a highly complex, deliberately slow analytical query that joins multiple massive tables.
Using EXPLAIN (ANALYZE, BUFFERS), you must map out the execution tree and identify the bottlenecks. You will write a technical report detailing exactly why the PostgreSQL planner chose its specific access paths and join strategies (e.g., why it chose a Nested Loop over a Hash Join) by querying the pg_class and pg_statistic system catalogs.
Finally, you must optimize the query. You are required to implement at least three distinct modifications to achieve a minimum 50% reduction in execution time. These modifications can include creating specialized indexes, altering session-level memory parameters (like work_mem), or adjusting planner cost constants (like random_page_cost). You must provide the "Before" and "After" execution plans and justify your changes.
Rubric:
| Criteria | Excellent | Proficient | Needs Improvement |
|---|---|---|---|
| Execution Plan Analysis | Flawlessly identifies the most expensive nodes; accurately calculates buffer hits/reads and memory usage. | Identifies main bottlenecks but misses nuanced buffer or memory details. | Fails to identify the true bottleneck; misinterprets the EXPLAIN output. |
| Statistical Justification | Accurately links planner choices to specific data in pg_statistic (e.g., MCVs, histograms) and system cost constants. | Mentions statistics but does not pull specific catalog data to prove the planner's logic. | Provides no correlation between system catalogs and the execution plan. |
| Optimization Implementation | Achieves >50% speedup using three distinct, well-reasoned tuning techniques. | Achieves speedup but relies entirely on basic B-tree indexes; ignores memory or cost parameters. | Fails to achieve significant speedup or uses brute-force methods without understanding. |
| Technical Reporting | Explanations are highly technical, clearly written, and correctly use PostgreSQL terminology (e.g., sequential scans, materialize nodes). | Explanations are adequate but occasionally use imprecise terminology. | Explanations are vague, lacking depth, or fundamentally misunderstand the executor pipeline. |
There are no comments for now.