In the last couple weeks, I have been preparing a training material for PostgreSQL developers. One of the sections explains the basic principles on how to add a custom module to PostgreSQL backend. In addition to writing code logics for this new module, there are several things you need to be aware of to be fully compatible. Today I will explain these basic principles about adding your own backend module based on PostgreSQL 15. Please note that backend module here does not mean a new process running in the background that you can observe from
ps -ef command; rather, it refers to a software component that these backend processes can use to achieve certain tasks.
2.0 Importance of Shared Memory
Shared memory in PostgreSQL is a big deal. All backend processes rely on shared memory to communicate with each other. It contains state information that is shared among all backends and also the user data is first put here before it gets presented to the user or flushed to disk. The size of shared memory can be configured by parameter
shared_buffers in postgreSQL with default value 128MB and minimal value 128KB.
3.0 General Initialization Procedure
Responsibility of postmaster:
- Create shared memory
- Initialization of background modules (buffer manager, lock manager …etc.)
- Start background processes (background writer, checkpointer …etc.)
- Main entry function PostmasterMain() – src/backend/postmaster/postmaster.c
Initialization of the background module:
- Usually refers to the initialization of shared memory + the initialization of the module itself
- Usually done once when the postmaster starts
Starting the background process:
- At this point, the background modules should have been initialized
- After a process starts, it can simply attach to the existing shared memory that has been created already.
3.1 Shared Memory APIs
src/backend/storage/ipc/shmem.c that provides interfaces to initialize (create or attach) shared memory based on a label. If a label already exists, it would just “attach” to the existing shared memory. If label does not exist, it would “create” a new shared memory segment.
Frequently called initialization function:
- ShmemInitHash() initializes the hash table in shared memory
- ShmemInitStruct() initializes the struct structure in shared memory
- add_size() is used to increase the total shared memory size
3.2 Estimate Total Shared Memory Size
CalculateShmemSize() function in
src/backend/storage/ipc/ipci.c. It Uses the interface functions provided by shmem.c to estimate the total shared memory required by all background modules. Each module usually has a corresponding function that can return the required memory size.
3.3 Request Shared Memory
CreateSharedMemoryAndSemaphores() function in
src/backend/storage/ipc/ipci.c. It Uses the interface functions provided by shmem.c to request shared memory. Each module usually has a corresponding function that can be called. If we were to add a custom module, we may need to register a dedicated shared memory space for that module using the API.
3.4 Module Initialize
Once the shared memory has been setup, the custom module normally would do an initialization on the requested shared memory. Depending on the nature of this module, it may read from a file and populate some data in shared memory or does no initialization at all.
4.0 Module Bootstrap Procedure
Bootstrap is another type of initialization that is normally performed once during database cluster initialization (during initdb). This is different from the regular module initialization, which is normally performed once during database startup (pg_ctl -D $PGDATA start).
You can add bootstrap logics in entry function
BootStrapXLOG() defined in
src/backend/access/transam/xlog.c. This is where PostgreSQL initializes the very first control file, WAL file, commit log… etc. Any cluster level modules can bootstrap here
Suppose we want to add a function to encrypt WAL files, which will affect the entire cluster, we can add logic during bootstrap to encrypt the WAL files with the encryption key provided by the user. We could even create a new directory to store this encryption key.
In this blog, we introduced the shared memory API provided by PostgreSQL and dives into the process of initialization and bootstrap in the PostgreSQL backend. All these topics are vital for developers looking to enhance their understanding of PostgreSQL or getting started with PostgreSQL development.
Cary is a Senior Software Developer in HighGo Software Canada with 8 years of industrial experience developing innovative software solutions in C/C++ in the field of smart grid & metering prior to joining HighGo. He holds a bachelor degree in Electrical Engineering from University of British Columnbia (UBC) in Vancouver in 2012 and has extensive hands-on experience in technologies such as: Advanced Networking, Network & Data security, Smart Metering Innovations, deployment management with Docker, Software Engineering Lifecycle, scalability, authentication, cryptography, PostgreSQL & non-relational database, web services, firewalls, embedded systems, RTOS, ARM, PKI, Cisco equipment, functional and Architecture Design.