How to create a system information function in PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

How to create a system information function in PostgreSQL

1. Overview

PostgreSQL supports many System Information Functions, such as Session Information Functions, Access Privilege Inquiry Functions, Schema Visibility Inquiry Functions, System Catalog Information Functions, Transaction ID and Snapshot Information Functions, etc. However, you may want build some special functions and integrate them into the PostgreSQL. This blog is going to walk through the steps about how to build your own System Information Functions into PostgreSQL.

2. Analyze the requirement

Since there are so many functions built-in PostgreSQL already, you should perform some research and analysis before you decide to create a new one. In this blog, I want to check the transaction id after each savepoints in an ongoing transaction, so that I can perform some visibility check before the whole transaction is committed. For example, I have a query like below, but I can’t figure out the transaction id after each savepoint if I use existing System Information Function.

postgres=# CREATE TABLE tbl (data text);
CREATE TABLE
postgres=# BEGIN;
BEGIN
postgres=# INSERT INTO tbl VALUES('HelloWorld-1');
INSERT 0 1
postgres=# SELECT txid_current();
 txid_current 
--------------
          488
(1 row)

postgres=# SAVEPOINT AA;
SAVEPOINT
postgres=# INSERT INTO tbl VALUES('HelloWorld-2');
INSERT 0 1

postgres=# SELECT txid_current();
 txid_current 
--------------
          488
(1 row)

postgres=#

As you can see, using existing txid_current() function, I always get the same Transaction ID 488 even after a savepoint. For this reason, I decide to create my own system information function to retrieve the information I want.

3. System Information Function template

To create your own System Information Function, you need to check the System Catalogs to see in which category it can fit. For the case mentioned above, I chose the catalog pg_proc which stores information about functions, procedures, aggregate functions, and window functions. The document System Catalog Initial Data provides more detailed description and examples about the .dat file format and the rules to define your own OID.

Now, let’s define a function, say current_xid_list();. Below is an example about how the function initial data may look like.

# function to get the top and sub transactions XIDs
{ oid => '5566', descr => 'get current transaction list',
  proname => 'current_xid_list', provolatile => 's',
  prorettype => 'txid_snapshot', proargtypes => '',
  prosrc => 'current_xid_list' },  

Here, you need to generate your OID which doesn’t create any conflict. In the official PostgreSQL document, 69.2.2. OID Assignment, it descripts how the OIDs are managed and also provides a script src/include/catalog/unused_oids to list the unused OIDs. For example, if you run the script like below, you will get a list of OIDs that are not used by PostgreSQL yet.

$ src/include/catalog/unused_oids
4 - 9
210
270 - 273
357
380 - 381
421
560 - 583
606
702 - 704
760 - 763
784 - 789
811 - 816
1177
1179 - 1180
1382 - 1383
1986 - 1987
2023
2030
2121
2137
2228
3432
3434 - 3435
3998
4035
4142
4187 - 4199
4225 - 4299
4388 - 4399
4532 - 4565
4572 - 4999
5022 - 5027
5032 - 5554
5556 - 5999
6015 - 6099
6103
6105
6107 - 6109
6116
6122 - 9999

In the example above, I picked up the OID 5566 just for easy to remember. However, if you are planning to generate a System Information Function as patch and later submit to PostgreSQL community, then you better follow the rule to minimize the risk of OID collisions with other patches. What the PostgreSQL community recommended is a random OID number between 8000—9999. Here how it is described in the official document, When choosing OIDs for a patch that is not expected to be committed immediately, best practice is to use a group of more-or-less consecutive OIDs starting with some random choice in the range 8000—9999.

After this system information initial data definition, we need to define the actual c function which can help to retrieve the information we need. In the case above, there might be a bunch of SAVEPOINTs in one transaction, so we need return a list of those ongoing transaction IDs. Below is an example, which refer to the way that how the function txid_current_snapshot was built.

/*
 * Same as txid_current_snapshot(), but returns top and sub transactions list
 *
 *      Return current top and sub transactions in TXID format
 *
 * Note that both top-transaction and sub-transaction XIDs are included.
 */
Datum
current_xid_list(PG_FUNCTION_ARGS)
{
    TxidSnapshot *snap;
    uint32      nxip,
                i;
    TxidEpoch   state;
    Snapshot    cur;
    xl_xact_assignment *sublist;
    TransactionId curxid = GetCurrentTransactionIdIfAny();

    cur = GetActiveSnapshot();
    if (cur == NULL)
        elog(ERROR, "no active snapshot set");

    load_xid_epoch(&state);

    StaticAssertStmt(MAX_BACKENDS * 2 <= TXID_SNAPSHOT_MAX_NXIP,
                     "possible overflow in txid_current_snapshot()");

    /* allocate */
    nxip = cur->xcnt;
    snap = palloc(TXID_SNAPSHOT_SIZE(nxip));
    sublist = palloc(MinSizeOfXactAssignment);

    /* fill */
    GetCurrentSubTransactionIdList(sublist);
    snap->xmin = sublist->xtop;
    snap->xmax = curxid;
    snap->nxip = sublist->nsubxacts;
    for (i = 0; i < snap->nxip; i++)
        snap->xip[i] = sublist->xsub[i];

    sort_snapshot(snap);

    /* set size after sorting, because it may have removed duplicate xips */
    SET_VARSIZE(snap, TXID_SNAPSHOT_SIZE(snap->nxip));

    PG_RETURN_POINTER(snap);
}

In the above function, you can define your data format. We decided to define a function which can help retrieve the transaction IDs within current transaction.

/*
 * GetCurrentSubTransactionIdList
 */
void
GetCurrentSubTransactionIdList(xl_xact_assignment *list)
{
   TransactionState s = CurrentTransactionState;

   list->nsubxacts = 0;
   list->xtop = GetTopTransactionIdIfAny();

   for (s = CurrentTransactionState; s != NULL; s = s->parent)
   {
       if (s->state == TRANS_ABORT)
           continue;
       if (!FullTransactionIdIsValid(s->fullTransactionId))
           continue;

       list->xsub[list->nsubxacts++] = s->fullTransactionId.value;
   }
}

Now, if you recompile PostgreSQL with above changes (of course, you need to declare the functions in proper files), you should be able to use your own System Information Function to see the difference based on what we has discussed at the beginning.

4. Testing

Once build success, restart the PostgreSQL server. Now, let’s try the previous SQL query again. As you can see, a different transaction ID 489 after a checkpoint is showing up.

postgres=# INSERT INTO tbl VALUES('HelloWorld-1');
INSERT 0 1
postgres=# SELECT txid_current();
 txid_current 
--------------
          488
(1 row)

postgres=# SAVEPOINT AA;
SAVEPOINT
postgres=# INSERT INTO tbl VALUES('HelloWorld-2');
INSERT 0 1

postgres=# SELECT txid_current();
 txid_current 
--------------
          488
(1 row)

postgres=# SELECT current_xid_list();
 current_xid_list 
------------------
 488:489:488,489
(1 row)

If you add another savepoint, and then compare your newly added function current_xid_list() with existing txid_current(), you can see that current_xid_list() displays not only the top transaction ID, but also all the transaction IDs after each savepoint.

5. Summary

In this blog, we discussed PostgreSQL System Information Functions, and perform a simple walk-through about how to create your own System Information Function following the official document, and run some simple tests.