Database System Internals (PostgreSQL)

0 %

Course content

Resources
Chapter 1: The Anatomy of PostgreSQL: Architecture and Process Model
- Overview & Readings
- Lesson 1: The Client/Server Model and Postmaster 10 xp
  - Quiz
- Lesson 2: Shared Memory Architecture 10 xp
  - Quiz
Chapter 2: The Journey of a Query: Lexing, Parsing, and the Traffic Cop
- Overview & Readings
Chapter 3: The PostgreSQL Rule System and Query Rewriting
- Overview & Readings
Chapter 4: The Query Planner Part I: Statistics and Cost Estimation
- Overview & Readings
Chapter 5: The Query Planner Part II: Path Generation and GEQO
- Overview & Readings
Chapter 6: The Executor: Processing the Plan Tree
- Overview & Readings
Chapter 7: Advanced Indexing Under the Hood
- Overview & Readings
Chapter 8: Multiversion Concurrency Control (MVCC) and Vacuuming
- Overview & Readings
Chapter 9: Memory Management and Caching Strategies
- Overview & Readings
Chapter 10: The Write-Ahead Log (WAL) and Crash Recovery
- Overview & Readings
Chapter 11: Replication: Physical and Logical
- Overview & Readings
Chapter 12: Distributed PostgreSQL and Sharding
- Overview & Readings
Chapter 13: Extending the Engine
- Overview & Readings
Project Based Assignments

Syllabus

Prerequisites

This is an advanced systems course. Students are expected to have successfully completed a foundational Database Management Systems course (or equivalent industry experience).

Required Knowledge: Proficiency in writing complex standard SQL (CTEs, Window Functions, Subqueries), a strong grasp of relational algebra, database normalization (up to 3NF), and basic data structure concepts (trees, hash tables).
Helpful but not required: Familiarity with C programming (for the engine-extension chapter) and basic Linux command-line navigation.

Target Audience

Upper-level computer science undergraduates or graduate students.
Aspiring Database Administrators (DBAs), Data Engineers, and Backend Systems Engineers who need to optimize high-throughput data applications.

Course Structure & Time Commitment

This course is structured for high retention, utilizing a micro-learning format paired with intensive, hands-on engineering projects.

Micro-Lessons: 13 core chapters divided into 15-minute focused lessons. (Approx. 65 total lessons / 16 hours of core lecture material).
Assessments: Each lessons will contain a "closed book" five question multiple-choice quiz and one short-form written response that can be completed with pen and paper. The course will have an open book multiple choice final covering all of the contents that can be completed in about one hour.
Project Work: Three major lab assignments requiring deep forensic analysis and tuning of a live PostgreSQL instance.
Estimated Total Commitment: 8–10 hours per week for a standard 14-week semester (roughly 120-140 total hours).

Core Learning Outcomes

By the end of this course, students will be able to:

Deconstruct the Execution Pipeline: Trace the exact path of a SQL string from lexical analysis through the cost-based optimizer and into the Volcano execution engine.
Optimize at the Physical Level: Read and interpret complex EXPLAIN (ANALYZE, BUFFERS) outputs to diagnose hardware-level bottlenecks, choosing the mathematically optimal access paths, join strategies, and index types (B-Tree, GiST, GIN, BRIN).
Manage Concurrency and Bloat: Explain the exact C-level implementation of Multiversion Concurrency Control (MVCC) and snapshot isolation, and tune the autovacuum daemon to prevent transaction ID wraparound and storage bloat.
Tune System Memory: Manipulate PostgreSQL’s dual-caching architecture by configuring shared_buffers, work_mem, and the clock-sweep eviction algorithm for specific OLTP or OLAP workloads.
Architect for Scale and Disaster Recovery: Configure physical Write-Ahead Log (WAL) streaming replication, logical decoding, and declarative partitioning using Foreign Data Wrappers (FDW) to build distributed, highly available data systems.

Outline

Chapter 1: The Anatomy of PostgreSQL: Architecture and Process Model

This chapter provides a structural overview of the PostgreSQL database engine. Students will explore the client/server model, the postmaster process, backend processes, and the shared memory architecture. We will cover the division of labor between background workers (e.g., checkpointer, background writer, WAL writer) and the memory structures they interact with.

1.1: The Client/Server Model and Postmaster. Focuses on the role of the postmaster daemon, the fork/exec model for handling incoming connections, and connection pooling concepts.
1.2: Shared Memory Architecture. Maps out the shared memory segment, specifically locating shared_buffers, WAL buffers, and the commit log (CLOG).
1.3: Core Background Workers. A quick breakdown of the responsibilities of the checkpointer, background writer (bgwriter), and WAL writer.
1.4: Utility Processes. Covers the autovacuum launcher and the statistics collector (PgStat) architecture.
1.5: The Backend Process. Details the memory limits and lifecycle of an individual backend process serving a specific client session.

Chapter 2: The Journey of a Query: Lexing, Parsing, and the Traffic Cop

We begin tracing the lifecycle of a query from the moment it hits the server. This chapter deconstructs the lexical analysis and parsing phases, where SQL text is transformed into an internal parse tree. Students will investigate how the "Traffic Cop" routes queries and handles utility versus optimizable statements.

2.1: The Query Lifecycle Overview. A high-level trace of a query from the wire protocol to the final result set.
2.2: Lexical Analysis. Explores how PostgreSQL uses Flex/Bison to tokenize raw SQL strings.
2.3: Analyzing the Parse Tree. Dissects the C-structures of a raw parse tree before any semantic meaning is applied.
2.4: Semantic Analysis. Explains how the analyzer checks catalogs to validate table/column names and creates the formal Query Tree.
2.5: The Traffic Cop. Shows how the engine routes simple utility commands (like CREATE TABLE) directly to execution, bypassing the planner.

Chapter 3: The PostgreSQL Rule System and Query Rewriting

Before a query is planned, it passes through the rewriter. This chapter explores how PostgreSQL implements views and rules by transforming parse trees into query trees. We will analyze the differences between the rule system and triggers, and how the rewriter flattens views for optimization.

3.1: Rules vs. Triggers. Defines the difference between logical query rewriting (rules) and physical row-level execution (triggers).
3.2: The Query Tree Structure. The Node types that make up a Query Tree (TargetList, FromExpr, etc.).
3.3: Views Under the Hood. Explains how PostgreSQL does not store views as physical objects, but rather as rewrite rules (ON SELECT DO INSTEAD).
3.4: The Rewriter in Action. A step-by-step trace of the rewriter flattening a view definition into the main Query Tree.

Chapter 4: The Query Planner Part I: Statistics and Cost Estimation

An introduction to the cost-based optimizer. This chapter details how PostgreSQL predicts the computational cost of a query before running it. Students will evaluate the system catalogs (pg_class, pg_statistic), analyze how ANALYZE gathers data distributions (histograms, MCVs), and calculate specific cost parameters (e.g., seq_page_cost, random_page_cost) explicitly dictate planner decisions.

4.1: The System Catalogs. A tour of pg_class and pg_attribute to see where the planner looks for table definitions and page counts.
4.2: The Role of ANALYZE. Explains the sampling algorithm ANALYZE uses to gather table statistics without scanning every row.
4.3: Understanding pg_statistic. Breaks down Most Common Values (MCVs) and histograms to understand how Postgres calculates data distribution.
4.4: Cost Constants. Defines the arbitrary mathematical weights Postgres uses, such as seq_page_cost and random_page_cost.
4.5: Calculating Cost. A practical math exercise showing exactly how the planner calculates the total cost of a simple sequential scan.

Chapter 5: The Query Planner Part II: Path Generation and GEQO

Building on cost estimation, this chapter covers how the planner generates and evaluates different execution paths. We will dissect the algorithms behind sequential scans vs. index scans, and analyze join strategies (Nested Loop, Hash Join, Merge Join). The chapter concludes with a look at Genetic Query Optimization (GEQO) for handling complex queries with many joins.

5.1: Path Generation Basics. Explains the difference between Access Paths (how to read a table) and Join Paths (how to connect tables).
5.2: Index Scan Paths. How the planner determines if a B-Tree index scan is cheaper than a sequential scan based on selectivity.
5.3: Join Strategies I. Details the mechanics and cost models of Nested Loop and Hash Joins.
5.4: Join Strategies II. Details the mechanics of Merge Joins and how the planner uses explicit or implicit sorts.
5.5: Dynamic Programming in the Planner. Explores the System R-style bottom-up dynamic programming used to evaluate join orders.
5.6: Genetic Query Optimization (GEQO). Explains the fallback heuristic algorithm used when a query exceeds geqo_threshold (usually 12+ joins).

Chapter 6: The Executor: Processing the Plan Tree

Once the cheapest path is chosen, the plan tree is handed to the Executor. Students will study PostgreSQL's demand-driven, pipeline execution model (the Volcano model). We will examine the mechanics of specific execution nodes (e.g., Agg, Sort, Limit) and how tuples are fetched, processed, and passed up the tree to the client.

6.1: The Volcano Execution Model. Explains the demand-driven, pipeline execution model where parent nodes pull tuples from child nodes.
6.2: Executor Phases. Defines the InitPlan, ExecProcNode, and EndPlan phases of executor initialization and teardown.
6.3: Scan Nodes. Traces the exact C-functions used to fetch a physical tuple from disk or memory during a scan.
6.4: Join and Materialize Nodes. Shows how the executor creates in-memory hash tables or materializes subqueries to process joins.
6.5: Aggregate and Sort Nodes. Contrasts streaming operations with blocking operations (like SORT) that must consume all input before outputting rows.

Chapter 7: Advanced Indexing Under the Hood

Moving beyond standard B-trees, this chapter explores the physical structure and use cases of PostgreSQL's specialized index types. We will dissect GiST (Generalized Search Tree), GIN (Generalized Inverted Index), BRIN (Block Range Index), and SP-GiST. Students will learn how to write operator classes and how the engine determines index applicability.

7.1: B-Tree Internals. Explores page layouts, right-links, and page splitting in PostgreSQL's standard index.
7.2: GiST Indexes. Analyzes Generalized Search Trees, focusing on bounding boxes and spatial data (PostGIS).
7.3: GIN Indexes. Analyzes Generalized Inverted Indexes, explaining how they map elements to arrays or JSONB documents.
7.4: BRIN Indexes. Explains Block Range Indexes, highlighting their tiny footprint and use in sequential time-series data.
7.5: Operator Classes. A brief introduction to how PostgreSQL allows developers to teach indexes how to sort or evaluate custom data types.

Chapter 8: Multiversion Concurrency Control (MVCC) and Vacuuming

A intensive exploration of transactional integrity without locking overhead. This chapter covers the hidden system columns (xmin, xmax), snapshot isolation, and how PostgreSQL resolves row visibility. We will examine the physical bloat caused by MVCC and the inner workings of the VACUUM and autovacuum processes to reclaim space and prevent transaction ID wraparound.

8.1: The MVCC Concept. Explains the philosophy of why readers don't block writers through the use of immutable row versions.
8.2: Tuple Header Anatomy. Dissects the hidden system columns: xmin, xmax, cmin, cmax, and ctid.
8.3: Snapshot Isolation. Explains how Postgres captures a list of active transactions to determine which tuples are visible to a specific query.
8.4: Transaction ID (XID) Wraparound. Explains the 32-bit integer limit of XIDs, the concept of "frozen" tuples, and the danger of wraparound.
8.5: The VACUUM Process. Details how VACUUM identifies dead tuples, updates the Free Space Map (FSM), and reclaims space.
8.6: Autovacuum Tuning. Covers the math behind autovacuum_vacuum_scale_factor and autovacuum_vacuum_threshold.
8.6: Assignment Reminder. This course contains three project based assignments that can completed at any time. Now is a good time to complete Project 2: Forensic Analysis of MVCC and Vacuuming.

Chapter 9: Memory Management and Caching Strategies

This chapter covers how PostgreSQL interacts with RAM. We will contrast PostgreSQL's shared_buffers with the operating system page cache (double buffering). Students will learn how work_mem dictates the performance of sorts and hash tables, and how the clock-sweep algorithm handles buffer eviction..

9.1: The Dual Caching Model. Contrasts PostgreSQL's shared_buffers with the Operating System's Page Cache.
9.2: The Buffer Manager. Explains how physical disk blocks are mapped to buffer tags in shared memory.
9.3: Eviction Policies. Details the Clock-Sweep algorithm PostgreSQL uses to determine which pages to evict when buffers are full.
9.4: Local Memory (work_mem). Explains how local memory dictates the performance of sorts, hashes, and bitmap index scans, and the risk of spilling to disk.
9.5: Maintenance Memory. Covers maintenance_work_mem and its impact on index creation and VACUUM speed.
9.6: Assignment Reminder. This course contains three project based assignments that can completed at any time. Now is a good time to complete Project 1: Deconstructing the Query Planner and Execution Pipeline.

Chapter 10: The Write-Ahead Log (WAL) and Crash Recovery

An exploration of durability. Students will learn the physical layout of the WAL, how physiological logging works, and the mechanics of the checkpointer process. We will walk through the exact sequence of events during crash recovery and how PostgreSQL replays the WAL to guarantee the ACID property of Durability.

10.1: The Purpose of WAL. Explains the concept of Durability, fsync, and why appending to a sequential log is faster than random disk writes.
10.2: WAL Physical Structure. Details 16MB WAL segments, Log Sequence Numbers (LSN), and WAL records.
10.3: Physiological Logging. Explains PostgreSQL's approach to logging: physical at the page level, logical at the row level.
10.4: Checkpoints. Details the checkpointer process, how it flushes dirty buffers, and updates the pg_control file.
10.5: Crash Recovery Mechanics. A step-by-step trace of how Postgres reads the pg_control file and replays the WAL from the last checkpoint after a crash.

Chapter 11: Replication: Physical and Logical

This chapter shifts focus to data distribution and high availability. We will compare physical streaming replication (byte-for-byte copy) with logical replication (publish/subscribe at the row level). Students will explore replication slots, the walreceiver/walsender processes, and logical decoding plugins.

11.1: Physical Streaming Replication. Explains the mechanics of streaming byte-for-byte WAL records to a standby server.
11.2: WalSender and WalReceiver. The specific background processes that facilitate the replication connection.
11.3: Replication Slots. How slots prevent the primary server from recycling WAL segments before the replica has consumed them.
11.4: Logical Replication Basics. Contrasts physical replication with the Publish/Subscribe model of row-level logical replication.
11.5: Logical Decoding. Explains how output plugins (like pgoutput) decode binary WAL into JSON or raw SQL statements.

Chapter 12: Distributed PostgreSQL and Sharding

Scaling PostgreSQL beyond a single node. This chapter covers native declarative partitioning, Foreign Data Wrappers (FDW), and parallel query execution. We will also examine architectures like Citus that transform PostgreSQL into a distributed database, discussing distributed query planning and two-phase commit protocols.

12.1: Declarative Partitioning. Covers the syntax and internal routing of Range, List, and Hash partitioning.
12.2: Partition Pruning. Explains how the planner (and executor) intelligently skips scanning partitions that don't match the query bounds.
12.3: Foreign Data Wrappers (FDW). The architecture of postgres_fdw and how the planner pushes down operations to remote servers.
12.4: Parallel Query Execution. Explains Gather nodes, parallel worker processes, and how the engine divides table scans among CPU cores.
12.5: Introduction to Sharding. Discusses distributed database architectures (like Citus) and the complexities of Distributed Transactions / Two-Phase Commit.
12.6: Assignment Reminder. This course contains three project based assignments that can completed at any time. Now is a good time to complete Project 3: Architecting for Scale with Partition Pruning and FDW.

Chapter 13: Extending the Engine

The course concludes by exploring PostgreSQL's extensibility. Students will learn the architecture behind creating custom data types, writing background workers in C, and leveraging extension hooks to modify the parser, planner, or executor behavior without altering the core source code.

13.1: Extensibility Architecture. Explains the design philosophy that allows Postgres to be modified without recompiling the core kernel.
13.2: Custom Data Types. An overview of the C-language interface required to define physical storage and operators for a new data type.
13.3: Background Worker Processes. How to write and register a custom daemon process that hooks into the postmaster lifecycle.
13.4: Extension Hooks. Explains how to use planner_hook and executor_hook to intercept and modify engine behavior on the fly.
13.5: Packaging Extensions. The basics of the PGXS build infrastructure, control files, and creating standard CREATE EXTENSION modules.