In previous blogs, we talked about an overview of PostgreSQL’s table access method API here , and how sequential scan is handled within this API here. Today we will explore how data insertion is handled by the table access method API.
The main API for handling insertion is called
tuple_insert, which points to the
heapam_tuple_insert() function as
heap is the default access method set by PostgreSQL. A tuple can be understood as a row in a table in which we are trying to insert. Both terms will be used interchangeably below.
void (*tuple_insert) (Relation rel, TupleTableSlot *slot,
CommandId cid, int options,
struct BulkInsertStateData *bistate);
Relation routine represents the table that the data is trying to insert, which normally contains the OID of the table, the table’s
tupleDescriptor structure which describes all the columns and their respective data types.
TupleTableSlot (TTS) contains the actual row data represented in a format understood by the PostgreSQL internal module such as the executor. One of the responsibilities of
heapam_tuple_insert() routine is to convert TTS into the
heap format in which the heap access table can understand. This is the format that will be stored in the buffer manager module and eventually in the physical disk.
CommandId field represents the command ID, which is the sequence ID in which this row is inserted within a transaction. The lower the command ID, the earlier they are inserted.
The options contains additional flags that should be included in the row and BulkInsertStateData is a control structure for bulk-insertion, which we will not talk in detail today.
Involvement of Other Modules
The heap access method itself is not sufficient to complete a data insertion. It relies on
- Buffer Manager – to determine the placement of this new row (which block at which offset)
- Wal Manager – to write a WAL record indicating this insertion event (for future recovery purposes)
- Storage Manager – invoked by buffer manager to physically read or write a block from or to the disk
heap_tuple_insert is where TTS data is converted to
Heap Tuple and it passes this data format to
heap_insert() to handle the rest.
Several things happen inside
This routine prepares the header of this tuple. A header normally contains:
- xmin – the transaction ID that inserts this tuple (that is us! So we fill this with the value returned by
- xmax – the transaction ID that removes this tuple. We are inserting, so no one is deleting it, so normally we mark it 0 here.
- cid – the command ID of this tuple. This is given to us from the executor.
- table OID – the OID of the table. This is also given to us from the executor.
- Other flags information such as if the tuple is a TOAST tuple or a FROZEN tuple. We will discuss these next time.
Interaction with Buffer Manager
We should have the heap tuple built at this point, we need a place to put it. Heap access method relies on buffer manager module to find a place to store this tuple.
RelationGetBufferForTuple() routine and provide the size of the tuple we have just built. Based on the size of the tuple, The buffer manager will find a suitable location (data block, also called a data page). It relies on FSM (Free Space Map) to determine which block has enough space to put this tuple. If exiting blocks cannot satisfy, the buffer manager will request to
extend its storage via storage manager, which means to create a new block of data just for this tuple.
Once we have a buffer location from buffer manager,
heap_insert() will then call
RelationPutHeapTuple() to actually put the heap data into the buffer location specified.
heap_insert() marks this block as “dirty”, because we have just added some data to it. A dirty block means that this block has a new update and has not yet been flushed onto the disk. During the next checkpoint period, dirty blocks will be flushed to disk first.
Interaction with WAL Manager
heap_insert() has written some data to a buffer block, it will use WAL manager routines to populate a proper WAL record and insert a new WAL record via the WAL manager. This WAL record can be used for future recovery of such data.
After the WAL is written, a LSN value will return, which indicates the position of the WAL that it has just written.
heap_insert() will then update this LSN value to the data page header via the
End of Insert
After the tuple data is inserted to a buffer appointed by buffer manager, and the WAL record written with the help from WAL manager, it will then obtain the
CTID value from the inserted tuple, clean up its resource and return the
CTID value back to executor. A CTID is a special type that tells exactly where the tuple is located (at which block’s which offset). With this, we can quickly fetch a tuple. This marks the end of a tuple insertion in PostgreSQL
Cary is a Senior Software Developer in HighGo Software Canada with 8 years of industrial experience developing innovative software solutions in C/C++ in the field of smart grid & metering prior to joining HighGo. He holds a bachelor degree in Electrical Engineering from University of British Columnbia (UBC) in Vancouver in 2012 and has extensive hands-on experience in technologies such as: Advanced Networking, Network & Data security, Smart Metering Innovations, deployment management with Docker, Software Engineering Lifecycle, scalability, authentication, cryptography, PostgreSQL & non-relational database, web services, firewalls, embedded systems, RTOS, ARM, PKI, Cisco equipment, functional and Architecture Design.