There are many approaches for PostgreSQL to retrieve the data back to the user. Depending on the user’s input query, the planner module is responsible for selecting the most optimum approach to retrieve the requested data. Sequential scan is one of these approaches that is mostly selected when the user requests a large volume of data (for example, “SELECT * from tablename;”) or when a table has no index declared. Sequential scan is mostly handled by the table access method API within PostgreSQL and
heap access method is the default one PostgreSQL uses today. In this short post, I will show you how sequential scan is done in the table access method API.
2. Table Access Method APIs in PostgreSQL
Pluggable table access method API has been made available since PostgreSQL 12, which allows a developer to redefine how PostgreSQL stores / retrieves table data. This API contains a total of 42 routines that need to be implemented in order to complete the implementation and honestly it is no easy task to understand all of them and to implement them. This API structure is defined in
tableam.h under the name
typedef struct TableAmRoutine
Today I will describe the routines related to sequential scan and I hope it could help you if you are someone looking to create your own table access method.
3. Sequential Scan Overall Call Flow
Few of the 42 routines will be called by executor just to complete a sequential scan request. This section will describe these routines in the order they are called.
uint64 (*relation_size) (Relation rel, ForkNumber forkNumber);
relation_size is the first routine to be called and it is relatively simple. The caller will expect the routine to return the total size of the relation described by
forkNumber. The default
heap access method will simply invoke the storage manager
smgr to find the number of data blocks this particular relation physically occupies on disk and multiplies that number with the size of each block
BLCKSZ (default is 8k). If you are not sure about the relationship between relation and its fork number, you could refer to this blog to get more information.
The size returned by this routine basically sets the boundary of our sequential scan.
const TupleTableSlotOps *(*slot_callbacks) (Relation rel);
Next, the executor needs to find out which set of tuple table slot (TTS) callback operation this table access method is compatible with. TTS is a set of routines that ensures the tuple storage is compatible between the executor and your access method. The executor will execute the TTS callback to
translate your tuple strucuture to
TupleTableSlot format in which the executor will understand. The default
heap access method uses
TTSOpsBufferHeapTuple defined in
execTuples.c to handle this operation
TableScanDesc (*scan_begin) (Relation rel, Snapshot snapshot, int nkeys, struct ScanKeyData *key, ParallelTableScanDesc pscan, uint32 flags);
Now the scan can officially begin. This is sequential scan’s initialization routine in which it will allocate a new scan descriptor using the parameters passed in by the executor. The purpose of scan descriptor structure is to keep track of the sequential scan while it is being executed. For example, to track where the scan should begin,; when was the block number of the last scan; which block should we resume scanning and how many blocks have been scanned…etc. scan descriptor will be destroyed once the sequential scan has completed.
The executor expects the routine to return a fully allocated and initialize pointer to
bool (*scan_getnextslot) (TableScanDesc scan, ScanDirection direction, TupleTableSlot *slot);
This is the main routine for sequential scan where the caller expects the routine to fetch one tuple from the buffer manager, converts it to the TTS format in which executor understands and save it in the input pointer called
slot. Each call to this routine will results in one tuple to be returned. If a table contains 1000 tuples, this function will be called 1000 times. The boolean return code is the indication to the caller if the routine has more tuples to return, as soon as
false is returned, it signals the executor that we have exhausted all the tuples and it should stop calling this function.
In normal sequential scan case, this routine works in per-page mode. This means it will read one full block from buffer manager and scan it to get all the tuple addresses and their offsets in the scan descriptor, so in the subsequent calls to the same function, it will not load the full page again from buffer manage all the time; it will only start to load the next block when all the tuples on the current block have been scanned and returned.
As you can see, the scan descriptor plays an important role here as most of the control information is saved there and is regularly updated whenever a call the
scan_getnextslot is made.
void (*scan_end) (TableScanDesc scan);
This is the last routine to be called to basically clean up the table scan descriptor, which was used heavily during the sequential scan. At this point, the executor should already have all the tuple information from the sequential scan methods.
4. Prepare the Data to Return
Now, the executor is finished with the table access method and has already had access to all the tuples for a particular relation. It then needs to go through another round of filtering to determine which of these tuples satisfy the condition set by the user, (for example, when user gives the WHERE clause to limit the scan results). This is done in another infinite
for loop in
execScan.c to perform
ExecQual on each TTS. Finally, the end results will be sent to the end user.
What we have discussed here is the basic call flow of a simple sequential scan. If we were to visualize the process, it should look something like this:
Cary is a Senior Software Developer in HighGo Software Canada with 8 years of industrial experience developing innovative software solutions in C/C++ in the field of smart grid & metering prior to joining HighGo. He holds a bachelor degree in Electrical Engineering from University of British Columnbia (UBC) in Vancouver in 2012 and has extensive hands-on experience in technologies such as: Advanced Networking, Network & Data security, Smart Metering Innovations, deployment management with Docker, Software Engineering Lifecycle, scalability, authentication, cryptography, PostgreSQL & non-relational database, web services, firewalls, embedded systems, RTOS, ARM, PKI, Cisco equipment, functional and Architecture Design.