Database System Internals (PostgreSQL)

0 %

Course content

Resources
Chapter 1: The Anatomy of PostgreSQL: Architecture and Process Model
- 1.0: Overview & Readings
- 1.1: The Client/Server Model and Postmaster 10 xp
  - Quiz
- 1.2: Shared Memory Architecture 10 xp
  - Quiz
- 1.3: Core Background Workers
- 1.4: Utility Processes
- 1.5: The Backend Process
Chapter 2: The Journey of a Query: Lexing, Parsing, and the Traffic Cop
Chapter 3: The PostgreSQL Rule System and Query Rewriting
- 3.0 Overview & Readings
- 3.1: Rules vs Triggers
- 3.2: The Query Tree Structure
- 3.3: Views Under the Hood
- 3.4: The Rewriter in Action
Chapter 4: The Query Planner Part I: Statistics and Cost Estimation
- 4.0: Overview & Readings
- 4.1: The System Catalogs
- 4.2: The Role of ANALYZE
- 4.3: Understanding pg_statistic
- 4.4: Cost Constants
- 4.5: Calculating Cost
Chapter 5: The Query Planner Part II: Path Generation and GEQO
Chapter 6: The Executor: Processing the Plan Tree
- 6.0: Overview & Readings
- 6.1: The Volcano Execution Model
- 6.2: Executor Phases
- 6.3: Scan Nodes
- 6.4: Join and Materialize Nodes
- 6.5: Aggregate and Sort Nodes
Chapter 7: Advanced Indexing Under the Hood
- 7.0: Overview & Readings
- 7.1: B-Tree Internals
- 7.2: GiST Indexes
- 7.3: GIN Indexes
- 7.4: BRIN Indexes
- 7.5: Operator Classes
Chapter 8: Multiversion Concurrency Control (MVCC) and Vacuuming
Chapter 9: Memory Management and Caching Strategies
- 9.0: Overview & Readings
- 9.1: The Dual Caching Model
- 9.2: The Buffer Manager
- 9.3: Eviction Policies
- 9.4: Local Memory (work_mem)
- 9.5: Maintenance Memory
- 9.6: Assignment Reminder
Chapter 10: The Write-Ahead Log (WAL) and Crash Recovery
- 10.0: Overview & Readings
- 10.1: The Purpose of WAL
- 10.2: WAL Physical Structure
- 10.3: Physiological Logging
- 10.4: Checkpoints
- 10.5: Crash Recovery Mechanics
Chapter 11: Replication: Physical and Logical
Chapter 12: Distributed PostgreSQL and Sharding
Chapter 13: Extending the Engine
- 13.0: Overview & Readings
- 13.1: Extensibility Architecture
- 13.2: Custom Data Types
- 13.3: Background Worker Processes
- 13.4: Extension Hooks
- 13.5: Packaging Extensions
Project Based Assignments

Project 1: Deconstructing the Query Planner and Execution Pipeline

You are provided with a 10GB dataset simulating an e-commerce platform (schema and data generation script provided). Your task is to analyze a highly complex, deliberately slow analytical query that joins multiple massive tables.

Using EXPLAIN (ANALYZE, BUFFERS), you must map out the execution tree and identify the bottlenecks. You will write a technical report detailing exactly why the PostgreSQL planner chose its specific access paths and join strategies (e.g., why it chose a Nested Loop over a Hash Join) by querying the pg_class and pg_statistic system catalogs.

Finally, you must optimize the query. You are required to implement at least three distinct modifications to achieve a minimum 50% reduction in execution time. These modifications can include creating specialized indexes, altering session-level memory parameters (like work_mem), or adjusting planner cost constants (like random_page_cost). You must provide the "Before" and "After" execution plans and justify your changes.

Rubric:

Criteria	Excellent	Proficient	Needs Improvement
Execution Plan Analysis	Flawlessly identifies the most expensive nodes; accurately calculates buffer hits/reads and memory usage.	Identifies main bottlenecks but misses nuanced buffer or memory details.	Fails to identify the true bottleneck; misinterprets the EXPLAIN output.
Statistical Justification	Accurately links planner choices to specific data in pg_statistic (e.g., MCVs, histograms) and system cost constants.	Mentions statistics but does not pull specific catalog data to prove the planner's logic.	Provides no correlation between system catalogs and the execution plan.
Optimization Implementation	Achieves >50% speedup using three distinct, well-reasoned tuning techniques.	Achieves speedup but relies entirely on basic B-tree indexes; ignores memory or cost parameters.	Fails to achieve significant speedup or uses brute-force methods without understanding.
Technical Reporting	Explanations are highly technical, clearly written, and correctly use PostgreSQL terminology (e.g., sequential scans, materialize nodes).	Explanations are adequate but occasionally use imprecise terminology.	Explanations are vague, lacking depth, or fundamentally misunderstand the executor pipeline.

Database System Internals (PostgreSQL)

Completed

Project 1: Deconstructing the Query Planner and Execution Pipeline

Project 1: Deconstructing the Query Planner and Execution Pipeline

Rubric: