Quick Overview of PostgreSQL’s Table Access Method

Enterprise PostgreSQL Solutions

Comments are off

Quick Overview of PostgreSQL’s Table Access Method

What is a Table Access Method?

Table access method is the interface between the PostgreSQL core and data storage management. Since PostgreSQL 12, it is possible to define your own custom table access method that stores data in custom forms by implementing over 45 interface API callback functions. Generally, implementing all of the interface API callback functions is a difficult task as you are essentially defining your own custom storage engine that has to cooperate with PostgreSQL core to achieve:

  • sequential scan
  • parallel scan
  • index fetch
  • query estimate
  • insert, update, delete, truncate
  • table creatio, vacuum, vacuum full
  • toast….etc

OrioleDB, for example, is a custom storage engine for PostgreSQL that provides a more modern way to store data.

Heap is the default and the only table access method supported in PostgreSQL.

Interface APIs

Defined in src/include/access/tableam.h, and heap access method’s implementation is located in src/backend/access/heap/heapam_handler.c.

There are different interface callback function for different purposes. Examine closely, you would notice that the functions involving data retrieval (scan) and data insertion (insert, update ..etc) are invoked by PostgreSQL core with a input data structure called Tuple Table Slot (TTS).

Regardless if you are using heap access method or defining your own access method, you will get the data in the format of TTS. It is the access method’s responsibility to understand this structure and convert it into the format (for example, heap tuple)to be physically stored on disk.

Likewise, when PostgreSQL requested data from the access method, it is responsible for converting the stored data format back to TTS.

What is Tuple Table Slot (TTS)?

It is basically the format understood by PostgreSQL (Executor module specifically).

  • TTS is an internal data structure that holds a single row of data, including column values.
  • It is a basic component in the statement processing (Query Processing) process.
  • Used to store rows returned by queries, and also used to store rows to be inserted or updated.
  • Common data format between Executor module and Table Access Method.
  • Their life cycle also follows query processing.
  • TTS Operation callback function tells PostgreSQL how to convert TTS to Heap Tuple or other types of Tuple data formats.
  • The structure is defined in src/include/executor/tuptable.h

What is Heap Tuple?

It is basically the format stored on disk

  • Row Representation: Heap tuples are the physical storage representation of rows in a PostgreSQL table. Each heap tuple contains the actual data values for each column in a row.
  • Visibility Information: Heap tuples include metadata to track their visibility such as xmin, cid, hintbit flag…etc. This is crucial for PostgreSQL’s Multi-Version Concurrency Control (MVCC) system, allowing transactions to work with consistent snapshots of the data.
  • Support for Updates: When a row is updated, PostgreSQL marks the old heap tuple as “dead” (assign a xmax value) and creates a new version of the tuple with the updated data. This versioning system supports data consistency.
  • Indexed for Efficiency: Heap tuple contains ctid, which represents the physical location of such heap tuple (which page number and at what offset). An index normally contains tid look at a heap tuple instantly, without scanning entire table to find a match.

Tuple Table Slot vs Heap Tuple

Tuple Table Slot

  • data format used by the executor module in PostgreSQL kernel
  • used internally that includes:
    • total number of columns
    • description of columns
    • flags
    • datum array
    • NULL array
    • …etc

Heap Tuple

  • data format used by Heap Access Method
  • The format to be stored on disk that includes:
    • visibility information
    • flag
    • offset
    • physical location
    • actual user data
    • …etc

Access method is like a bridge, sitting in the middle, coordinating the instructions between PostgreSQL core and actual data storage

Summary

This blog intends to give a brief overview of table access method API and describes how it coordinates between Tuple Table Slot and Heap tuple data formats. Table access method is a huge architectural topic and we will gradually explore these different API calls in the subsequent blogs.