Introduction
In previous blogs, we talked about an overview of PostgreSQL’s table access method API here , how sequential scan is handled within this API here, and how data insertion is handled here. Today in this blog, we will look closely into how PostgreSQL handles update. A successful update in PostgreSQL can be viewed as “insert a new record” while “marking the old record as invisible” due to the MVCC technique that PostgreSQL employs. It sounds simple enough, but there are quite a lot of considerations in place to make a successful update. Let’s dive in.
APIs Involved
2 Table access method APIs are primarily involved when performing an update
- tuple_fetch_row_version(): called to find the latest version of the tuple to be updated by the given TID. We are expected to use the given TID to look up a particular tuple to update. Alternatively, a snapshot structure is given here to perform visibility check if applicable. The function is expected to fetch the tuple and convert it to Tuple Table Slot and return true if the tuple is fetched. Otherwise, it should return false.
- tuple_update(): the primary handler for processing a tuple update request. Basically, this function is given several parameters to perform an update:
- TID of the old tuple: the location of the old tuple to be updated
- A new tuple expressed in Tuple Table Slot: PostgreSQL converts this to HeapTuple to update
- Command ID and snapshot: so we can perform visibility check on the old tuple to be updated
The Update Routine
To perform an update, PostgreSQL performs a series of checks and considerations before it can perform the upgrade. This process is illustrated in the diagram below:
(1) Determine Columns Updated
The very first check the main update routine performs is to determine the columns to be updated. Particularly to find out if identity key columns have been update. This could be a primary key, index key, or partition key. It needs to know this because it may not need to acquire exclusive keys for update if identity key columns have not been updated. Furthermore, it has to be additional handling for replica identity column for logical replication, to ensure that enough information has been logged to WAL file to tell the logical subscriber which row has been updated.
(2) Determine if Tuple is Updable
This is a crucial step about concurrency control. PostgreSQL is a multi-process system and a tuple may be updated at the same time by multiple client connections. PostgreSQL of course, won’t allow a tuple being updated by multiple client at the same time, so before we can proceed, we need to make sure there is not another client updating the same tuple as we are. This can easily be checked by looking at the current old tuple’s xmax value in the header ( the transaction ID that modifies this tuple), and look up the CLOG or its hintbit flag to determine if the xmax value has been committed or not. There are several cases to consider here after a call to HeapTupleSatisfiesUpdate():
- Tuple is invisible: This means another client backend has already updated this tuple (marked it as invisible) and has committed the transaction. If this is the case, we are not able to update because there is nothing to update anymore according to the snapshot. So the system shall error out here.
- Tuple is being updated: This means another client backend has already updated this tuple (marked it as invisible) but has not committed or rollbacked the transaction. If this is the case, we have to
wait
right here until the other client’s transaction commits or aborts. The system will go into a waiting loop right and at the same time, it will perform a deadlock detection to ensure that the transaction ID that we are waiting for is not waiting for us to finish (cross waiting, aka deadlock). - Tuple can be updated: This means this tuple is not being updated by another client backend and we can proceed to update it.
(3) Prepare New Tuple Header
Once we have confirmed that we can update the old tuple, PostgreSQL will start to prepare the new tuple by converting it from Tuple Table Slot format to HeapTuple format. Fill it the necessary header information and get ready for insertion.
(4) TOAST Check
Once we have prepared the new HeapTuple structure, we need to examine its size against the remaining buffer page size given to us during update. We need to check if the new tuple can fit in the remaining space of the given buffer page. If Yes, then there is no additional action to perform and we can proceed to next stage of processing. If the given buffer page size cannot hold this new HeapTuple, then we need to perform TOAST, which is a technique in PostgreSQL to break down a large data into smaller chunks and store them in a distributed way. System will call heap_toast_insert_or_update()
to complete the TOAST and then it will generate a WAL record to indicate that a TOAST action has been performed here and that in order to use the value, DE-TOAST action is needed.
(5) Extract Replica Identity from Old Tuple
This action simply extracts the replica identity from the old tuple (such as primary key, index key..etc) so that we can include this additional information in the WAL segment later to tell logical replication subscriber exactly which row has been updated. Without this replica identity, the logical replication subscriber is only notified that a row has been updated to a new value, but it ain’t sure which row exactly got updated.
(6) Page Set Prunable
This basically makes a modification to a buffer page’s header, to indicate that this page will contain a dead tuple because we are currently performing an update on it. This is primarily meant to be looked at by the vacuum process to determine if a page is all visible (meaning every tuple inside is visible) or prunable (meaning there are dead tuples inside this page to vacuum out).
(7) Heap Only Tuple (HOT) Update?
This is a optimization employed by PostgreSQL to “save a index tuple” if the condition is right. It is called Heap Only Tuple (HOT) update because It literally means there is only heap tuple, and no index tuple directly associated with it even though an index is created on it.
Here’s the condition to trigger HOT to “save a index tuple”:
- no index columns have been updated
- the new tuple can be inserted to the same page as its old tuple.
If the above conditions are satisfied, PostgreSQL will insert the new tuple to a page, and “chain” the old tuple to this new tuple within a page. The index tuple associated with the old tuple is still pointing at the old tuple, but because we have chained the old and new tuples together within the same page, we can still locate the new tuple, by first landing on the old, and then to the new. This way, PostgreSQL does not have to create a new index tuple that points directly to the new tuple, saving some potential tuple space.
(8) Relation Put Heap Tuple
After all the checks above, we are now ready to put the new tuple to the given page, with the proper tuple headers and HOT flags in place.
(9) Mark Old Tuple Invisible
With the new tuple inserted, we can now mark the old tuple as invisible, simply by setting its xmax value to the current transaction ID, set its hintbit and the proper HOT flags.
(10) Mark Buffer Dirty
This is a buffer manager routine to tell it that we have modified the content of the page and it needs to be flushed to disk first in case it needs to eject buffer pages out. The periodic checkpoint process will try to flush this dirty page first to disk.
(11) Cache Invalidate Heap Tuple
This mechanism is used to notify all active backend processes to invalidate the heap tuple that have just been marked as invisible (deleted) if they have this tuple stored in their local cache.
(12) Update Index Flag
At the end of an update, there is a flag that we need to fill to tell executor whether an index tuple is needed to be created for the new tuple. In the case of HOT or when there is no index created on the table, we normally would set it to false.
Summary
Unlike insert and sequential scan, update takes a lot more considerations to complete. It not only has to take account of concurrency control, but also have to consider a series of optimization, over-sized tuples and replica identity for logical replications. This makes update generally a more expensive operations to perform.
Cary is a Senior Software Developer in HighGo Software Canada with 8 years of industrial experience developing innovative software solutions in C/C++ in the field of smart grid & metering prior to joining HighGo. He holds a bachelor degree in Electrical Engineering from University of British Columnbia (UBC) in Vancouver in 2012 and has extensive hands-on experience in technologies such as: Advanced Networking, Network & Data security, Smart Metering Innovations, deployment management with Docker, Software Engineering Lifecycle, scalability, authentication, cryptography, PostgreSQL & non-relational database, web services, firewalls, embedded systems, RTOS, ARM, PKI, Cisco equipment, functional and Architecture Design.
Recent Comments