The smgr interface in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

The smgr interface in PostgreSQL

1. Overview

In my previous blog posts, I explained how the amazing buffer tag works in PostgreSQL and how to set up a shared storage using the Lustre network file system. In this blog, I will explain the storage interface provided by PostgreSQL and an idea to experience the storage interface, namely, smgr, the storage manager.

2. smgr

smgr is a storage manager provided by PostgreSQL, with the following public interface routines to store tuples for tables and indexes. In my opinion, PostgreSQL mainly has three types of files persisted on disk: 1) “configuration” related files, such as postgresql.conf, pg_hba.conf, and pg_control, etc; 2) Write-Ahead Log files, i.e. WAL; and 3) the actual files to store table and index tuples. smgr is designed to handle only the latter. In other words, if you want to separate compute and storage, you should consider these three categories differently. This is not only because PostgreSQL has different logic to deal with them, but also because smgr was not designed to handle configuration and WAL. Here are the interfaces defined in smgr.c:

static const f_smgr smgrsw[] = {
    /* magnetic disk */
        .smgr_init = mdinit,
        .smgr_shutdown = NULL,
        .smgr_open = mdopen,
        .smgr_close = mdclose,
        .smgr_create = mdcreate,
        .smgr_exists = mdexists,
        .smgr_unlink = mdunlink,
        .smgr_extend = mdextend,
        .smgr_prefetch = mdprefetch,
        .smgr_read = mdread,
        .smgr_write = mdwrite,
        .smgr_writeback = mdwriteback,
        .smgr_nblocks = mdnblocks,
        .smgr_truncate = mdtruncate,
        .smgr_immedsync = mdimmedsync,

The interface contains all the necessary functions required for storage management, such as those basic file operations we are all very familiar with, including create, open, read, write, and close. The actual implementation always starts with md because PostgreSQL was designed to handle magnetic disks back in the old days.

PostgreSQL can handle all table and index tuples with these interfaces, thanks to the magic buffer tag, which is used to address a specific page block (8KB) at any given time. For details on how the buffer tag is used, you can refer to 3. How Buffer Tag is used.

3. separate compute and storage

To experience the smgr interface, a simple idea is to build a TCP/IP server and client. On the server side, add logic to implement some basic smgr interfaces such as create, open, read, write, and close. Then, define a simple protocol to indicate different actions with corresponding error codes. On the client side, replace “mdxxxx” in smgr.c with functions that can forward those file operations to the server side through a TCP/IP connection. The result for read or write a page from or to the server is then sent back to the caller. With these minimal changes, a very simple prototype can be set up to separate compute and storage. However, if you want to have a working solution with compute and storage separated, you need to consider a lot more factors. An open-source compute and storage solution called neon has been designed to achieve this with much more comprehensive design.

As an example, one of the implementations of smgr interfaces can be found in pagestore_smgr.c, with some changes showing below.

static const struct f_smgr neon_smgr =
    .smgr_init = neon_init,
    .smgr_shutdown = NULL,
    .smgr_open = neon_open,
    .smgr_close = neon_close,
    .smgr_create = neon_create,
    .smgr_exists = neon_exists,
    .smgr_unlink = neon_unlink,
    .smgr_extend = neon_extend,
    .smgr_prefetch = neon_prefetch,
    .smgr_read = neon_read,
    .smgr_write = neon_write,
    .smgr_writeback = neon_writeback,
    .smgr_nblocks = neon_nblocks,
    .smgr_truncate = neon_truncate,
    .smgr_immedsync = neon_immedsync,

    .smgr_start_unlogged_build = neon_start_unlogged_build,
    .smgr_finish_unlogged_build_phase_1 = neon_finish_unlogged_build_phase_1,
    .smgr_end_unlogged_build = neon_end_unlogged_build,

After re-implementing or wrapping these SMGR interfaces, table and index tuples can be written to local cache, with corresponding WAL generated for later redo usage. For more details, you can check the source code in “pagestore_smgr.c”.

4. Summary

In this blog post, we discussed the smgr interface and proposed a simple idea for people who are interested in how PostgreSQL handles tuple storage. We also provided a real example for those who want to dig PostgreSQL storage into the details.