Explore Table Access Method Capabilities: How Data Insertion is Handled

Enterprise PostgreSQL Solutions

Comments are off

Explore Table Access Method Capabilities: How Data Insertion is Handled

Introduction

In previous blogs, we talked about an overview of PostgreSQL’s table access method API here , and how sequential scan is handled within this API here. Today we will explore how data insertion is handled by the table access method API.

APIs Involved

The main API for handling insertion is called tuple_insert, which points to the heapam_tuple_insert() function as heap is the default access method set by PostgreSQL. A tuple can be understood as a row in a table in which we are trying to insert. Both terms will be used interchangeably below.


void		(*tuple_insert) (Relation rel, TupleTableSlot *slot,
		CommandId cid, int options,
		struct BulkInsertStateData *bistate);

The Relation routine represents the table that the data is trying to insert, which normally contains the OID of the table, the table’s tupleDescriptor structure which describes all the columns and their respective data types.

The TupleTableSlot (TTS) contains the actual row data represented in a format understood by the PostgreSQL internal module such as the executor. One of the responsibilities of heapam_tuple_insert() routine is to convert TTS into the heap format in which the heap access table can understand. This is the format that will be stored in the buffer manager module and eventually in the physical disk.

The CommandId field represents the command ID, which is the sequence ID in which this row is inserted within a transaction. The lower the command ID, the earlier they are inserted.

The options contains additional flags that should be included in the row and BulkInsertStateData is a control structure for bulk-insertion, which we will not talk in detail today.

Involvement of Other Modules

The heap access method itself is not sufficient to complete a data insertion. It relies on

  • Buffer Manager – to determine the placement of this new row (which block at which offset)
  • Wal Manager – to write a WAL record indicating this insertion event (for future recovery purposes)
  • Storage Manager – invoked by buffer manager to physically read or write a block from or to the disk

Basic Workflow

heap_tuple_insert is where TTS data is converted to Heap Tuple and it passes this data format to heap_insert() to handle the rest.

Several things happen inside heap_insert():

heap_prepare_insert()

This routine prepares the header of this tuple. A header normally contains:

  • xmin – the transaction ID that inserts this tuple (that is us! So we fill this with the value returned by GetCurrentTransactionId() call.
  • xmax – the transaction ID that removes this tuple. We are inserting, so no one is deleting it, so normally we mark it 0 here.
  • cid – the command ID of this tuple. This is given to us from the executor.
  • table OID – the OID of the table. This is also given to us from the executor.
  • Other flags information such as if the tuple is a TOAST tuple or a FROZEN tuple. We will discuss these next time.

Interaction with Buffer Manager

We should have the heap tuple built at this point, we need a place to put it. Heap access method relies on buffer manager module to find a place to store this tuple.

It calls RelationGetBufferForTuple() routine and provide the size of the tuple we have just built. Based on the size of the tuple, The buffer manager will find a suitable location (data block, also called a data page). It relies on FSM (Free Space Map) to determine which block has enough space to put this tuple. If exiting blocks cannot satisfy, the buffer manager will request to extend its storage via storage manager, which means to create a new block of data just for this tuple.

Once we have a buffer location from buffer manager, heap_insert() will then call RelationPutHeapTuple() to actually put the heap data into the buffer location specified.

Finally, heap_insert() marks this block as “dirty”, because we have just added some data to it. A dirty block means that this block has a new update and has not yet been flushed onto the disk. During the next checkpoint period, dirty blocks will be flushed to disk first.

Interaction with WAL Manager

After heap_insert() has written some data to a buffer block, it will use WAL manager routines to populate a proper WAL record and insert a new WAL record via the WAL manager. This WAL record can be used for future recovery of such data.

After the WAL is written, a LSN value will return, which indicates the position of the WAL that it has just written. heap_insert() will then update this LSN value to the data page header via the PageSetLSN() call.

End of Insert

After the tuple data is inserted to a buffer appointed by buffer manager, and the WAL record written with the help from WAL manager, it will then obtain the CTID value from the inserted tuple, clean up its resource and return the CTID value back to executor. A CTID is a special type that tells exactly where the tuple is located (at which block’s which offset). With this, we can quickly fetch a tuple. This marks the end of a tuple insertion in PostgreSQL