The Share Invalid mechanism of CACHE in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

The Share Invalid mechanism of CACHE in PostgreSQL

PostgreSQL will create some catalog and other caches for each connection request to store some queried data locally, which can speed up query efficiency. If a process undergoes a DDL operation that causes catalog to change, other running processes need to be notified that some cache data is invalid. We call this notification mechanism the SI (Share Invalid) mechanism.

I have made use of this feature in projects that I have worked before but never studied it in depth to understand it completely or appreciate its power. However I got a chance recently to experiment with this on a project and was able to learn more about it. This promoted to write this blog so I can share my knowledge on this explain the affectiveness of this feature.

1. Brief description of PostgreSQL cache

PostgreSQL’s cache is mainly the catalog cache. You can see the enum structure SysCacheIdentifier of the catalog cache in syscache.h. The catalog caches is mainly used to cache the contents of catalog, and there are some non-system caches too, these caches can be seen in the src/backend/utils/cache directory of the code.

describe_cache_in_postgres

As shown in the figure above, there is a simple query on the t1 table. Before the query, the backend first needs to obtain what t1 is, such as whether there is a t1 relationship, what attributes does this relationship have, and so on. Of course, this information is stored in each catalog. When backend obtains catalog information, it generally does not directly scan the hard disk data of the catalog, but tries to obtain data from the relevant cache. If the data that backend wants does not exist in the cache , the data will be scanned from the hard disk and cached in the cache.

2. Implementation of cache consistency

Each backend of PostgreSQL caches some catalog information, which creates the problem of cache consistency. Suppose there are two backends A and B who cached ‘t1’ tuple in RELNAMENSP, and then A has made some changes to the structure of t1, then B needs to be notified to invalidate this entry in its cache. The next time this entry is used on B backend, it needs to be scanned from the disk again.

①This is a cache of the pg_class catalog. The structure of this cache can be seen in the picture in Part 3.

share_invalid_mechanism

If the DDL statement of a backend modifies a tuple of a system table, it will register an msg locally to mark an item in the cache as invalid. If multiple tuples are modified, a msg list will be formed. As shown in the figure, the current msg list refers to the msg generated in a command of a transaction. When the command is switched, the msg in the current list will be merged into the prior.

The transaction in PostgreSQL may be divided into multiple commands (such as multiple QUERY). If a command modifies the catalog, the subsequent commands of this transaction need to be able to obtain the latest data in the local cache, and other backends can only obtain old data before current transaction committed.

When a local command is completed, the msg in the current list will be used to mark the corresponding entry in the cache in this backend as invalid. In this way, the next command will not get the old data from the cache. When this transaction is committed, all local msgs will be added to the sharebuffer. After bacnend B receives the SQL request, it will first check the msgs in the sharebuffer, and use these msgs to process its own cache, so as to ensure that the latest data is obtained from the cache.

3. Handle the SI message

typedef union
{
    int8        id;             /* type field --- must be first */
    SharedInvalCatcacheMsg cc;
    SharedInvalCatalogMsg cat;
    SharedInvalRelcacheMsg rc;
    SharedInvalSmgrMsg sm;
    SharedInvalRelmapMsg rm;
    SharedInvalSnapshotMsg sn;
} SharedInvalidationMessage;

Six types of SI messages are described in this union, each of which has its own unique structure. The aforementioned message to mark a tuple of a catalog cache as invalid is SharedInvalCatcacheMsg, only this kind of message is explained in this blog.

typedef struct
{
    int8        id;             /* cache ID --- must be first */
    Oid         dbId;           /* database ID, or 0 if a shared relation */
    uint32      hashValue;      /* hash value of key for this catcache */
} SharedInvalCatcacheMsg;

When the tuple of each catalog is stored in the cache, a hash value is calculated based on the cache key (for example, the key of RELNAMENSP is the Anum_pg_class_relname and Anum_pg_class_relname columns of the pg_class table), as shown in the hashValue attribute in the CatCTup structure. When processing the msg of SharedInvalCatcacheMsg type, the corresponding cache will be found according to the id, and then the corresponding CatCTup will be found according to the hashValue, and the dead attribute will be marked as true.

si_msg

4. Shared buffer processing of SI messages

The final submitted messages generated by a backedn will eventually be added to the message queue of the sharebuffer. This section will describe how these messages are managed in the sharebuffer.

si_sharebuffer

SISeg is a structure for storing SI information in a shared buffer, where buffer is an array for storing messages, which can be regarded as a circular array, where minMsgNum and maxMsgNum record the starting index and end index of still valid messages.

The procState records the processing of SI messages by each backend, where nextMsgNum is the index of the next message to be processed by a certain backend.

When the buffer is full (there is a special mechanism to determine what is ‘full’), it will trigger to clean up messages that are no longer needed. If a backend is inactive for a long time, it will cause the messages in the buffer to accumulate. At this time, when cleaning the buffer, it will choose to mark the backend as the state of resetState, and then the backend needs to awaken all of its Cache is set to invalid, or choose to send a PROCSIG_CATCHUP_INTERRUPT signal to this backend, this backend will process msg after receiving the signal.

Summary

This blog describes the principle that PostgreSQL keeps the cache of each backend consistent. If we add a new interface to modify the database data and cause the cache to change, then we need to consider sending an SI message so that other backends can get the latest from the cache. This is transparent to the DBA, but as a kernel developer, you have to understand these subtle issues to prevent consistency errors.