New Module Initialization vs Bootstrap in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

New Module Initialization vs Bootstrap in PostgreSQL

1.0 Introduction

In the last couple weeks, I have been preparing a training material for PostgreSQL developers. One of the sections explains the basic principles on how to add a custom module to PostgreSQL backend. In addition to writing code logics for this new module, there are several things you need to be aware of to be fully compatible. Today I will explain these basic principles about adding your own backend module based on PostgreSQL 15. Please note that backend module here does not mean a new process running in the background that you can observe from ps -ef command; rather, it refers to a software component that these backend processes can use to achieve certain tasks.

2.0 Importance of Shared Memory

Shared memory in PostgreSQL is a big deal. All backend processes rely on shared memory to communicate with each other. It contains state information that is shared among all backends and also the user data is first put here before it gets presented to the user or flushed to disk. The size of shared memory can be configured by parameter shared_buffers in postgreSQL with default value 128MB and minimal value 128KB.

3.0 General Initialization Procedure

Responsibility of postmaster:

  • Create shared memory
  • Initialization of background modules (buffer manager, lock manager …etc.)
  • Start background processes (background writer, checkpointer …etc.)
  • Main entry function PostmasterMain() – src/backend/postmaster/postmaster.c

Initialization of the background module:

  • Usually refers to the initialization of shared memory + the initialization of the module itself
  • Usually done once when the postmaster starts

Starting the background process:

  • At this point, the background modules should have been initialized
  • After a process starts, it can simply attach to the existing shared memory that has been created already.

3.1 Shared Memory APIs

Implemented in src/backend/storage/ipc/shmem.c that provides interfaces to initialize (create or attach) shared memory based on a label. If a label already exists, it would just “attach” to the existing shared memory. If label does not exist, it would “create” a new shared memory segment.

Frequently called initialization function:

  • ShmemInitHash() initializes the hash table in shared memory
  • ShmemInitStruct() initializes the struct structure in shared memory
  • add_size() is used to increase the total shared memory size

3.2 Estimate Total Shared Memory Size

Implemented in CalculateShmemSize() function in src/backend/storage/ipc/ipci.c. It Uses the interface functions provided by shmem.c to estimate the total shared memory required by all background modules. Each module usually has a corresponding function that can return the required memory size.

3.3 Request Shared Memory

Implemented CreateSharedMemoryAndSemaphores() function in src/backend/storage/ipc/ipci.c. It Uses the interface functions provided by shmem.c to request shared memory. Each module usually has a corresponding function that can be called. If we were to add a custom module, we may need to register a dedicated shared memory space for that module using the API.

3.4 Module Initialize

Once the shared memory has been setup, the custom module normally would do an initialization on the requested shared memory. Depending on the nature of this module, it may read from a file and populate some data in shared memory or does no initialization at all.

4.0 Module Bootstrap Procedure

Bootstrap is another type of initialization that is normally performed once during database cluster initialization (during initdb). This is different from the regular module initialization, which is normally performed once during database startup (pg_ctl -D $PGDATA start).

You can add bootstrap logics in entry function BootStrapXLOG() defined in src/backend/access/transam/xlog.c. This is where PostgreSQL initializes the very first control file, WAL file, commit log… etc. Any cluster level modules can bootstrap here

Suppose we want to add a function to encrypt WAL files, which will affect the entire cluster, we can add logic during bootstrap to encrypt the WAL files with the encryption key provided by the user. We could even create a new directory to store this encryption key.

5.0 Summary

In this blog, we introduced the shared memory API provided by PostgreSQL and dives into the process of initialization and bootstrap in the PostgreSQL backend. All these topics are vital for developers looking to enhance their understanding of PostgreSQL or getting started with PostgreSQL development.