1. Overview I have been working on an internal project based on PostgreSQL for a while, and from time to time, I need to run some specific test cases to verify my changes. Here, I want to shared a tip to run a specific regression TAP test quickly, especially, when you are focusing on a […]
1. Introduction In my previous blog here, I discussed about PostgreSQL’s point in time recovery where PostgreSQL supports an ability to recover your database to a specific time, recovery point or transaction ID in the past but I did not discuss in detail the concept of timeline id, which is also important in database recovery. […]
1. Overview When I was working on some backup and recovery related features for a project based on Postgres, I noticed that there is file called backup_label. By quickly google search, you can find some very nice blogs or books which discussed this topic, such as, The Internals of PostgreSQL, one of my favourite books. […]
1. Introduction Recently I have been practicing the internals of PostgreSQL continuous archiving and the point in time recovery features that are required for my current development work. Today I would like to demonstrate these important features with the recently released PostgreSQL 14 on Ubuntu 18.04. 2. Write Ahead Log? Before going into the details […]
1. Overview In my previous blog, I briefly walked through how the bulk/batch insertion was done for postgres_fdw in PG14. In this blog, I am going to run some basic tests to compare the performance for before and after the batch insertion was introduced in postgres_fdw, so that we can have a general idea about […]
1. Introduction Recently I have been tasked to familiarize myself with the Foreign Data Wrapper (FDW) interface API to build a new FDW capable of doing vertical / columnar sharding, meaning that the FDW is capable of collecting column information from multiple sources and combine them together as a result query. I will document and […]
Load balancing increases the system throughput from the application standpoint by distributing the read load from application to multiple replicated database nodes. Some time ago, I wrote a blog discussing effectiveness of Pgpool-II load balancer in different scenarios. Pgpool-II provides one of the best user-friendly load balancers for PostgreSQL that does not require any modification […]
1. Overview postgres_fdw has been existing in PostgreSQL for many years, and it was one of the most popular interfaces used in many extensions. such as the PostgreSQL foreign data wrappers wiki page, and PostgreSQL Extension Network/PGXN. However, I had not touched this postgres_fdw in deep until I got a task about combing multiple columns […]
So for one reason or another you have decided that you need horizontal scalability and you need to achieve it with PostgreSQL as your database. So what options do you have? The PostgreSQL database supports vertical scalability and can run on bigger and faster machines to increase the performance. But when it comes to horizontal […]
1. What Is Security? The word “Security” is a very broad concept and could refer to completely different procedures and methodology to achieve. Knowing what security means to your application is very important, so you could execute proper security practices and procedures to ensure the safety of your company’s assets. Data compromises could often lead […]
1. Overview I was working on the PostgreSQL storage related features recently, and I found PostgreSQL has designed an amazing storage addressing mechanism, i.e. Buffer Tag. In this blog, I want to share with you my understanding about the Buffer Tag and some potential usage of it. 2. Buffer Tag I was always curious to […]
During my most recent expedition of going through PostgreSQL source code, I ran into some mysterious variables in the page structure of some indexes. I will explain later why i am calling these mysterious but first have a look at these variables… typedef struct HashPageOpaqueData{ … uint16 hasho_page_id; /* for identification of hash indexes */} […]
1. Introduction I am working on a new PostgreSQL feature that redefines the way a tuple’s visibility status is determined. The feature is working very nicely until I start doing a large SELECT query, which triggers PostgreSQL to spawn multiple parallel workers to process the request. When this happens, the feature I am working on […]
I have spent a few days trolling through the features added to PostgreSQL 14 and in this blog, I want to share some statistics about PG-14 contributions, hopefully, you will find these statistics interesting. Please note that this data is based on my research on PG 14 contributions by going through the GIT LOG, commitfest […]
Back in August 2019, I wrote about the Horizontal scaling / Sharding in PostgreSQL and where it is going https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/. One of the key feature that i talked about in this blog is parallel foreign scan i.e. asynchronous append of FDW nodes. Finally the first patch of this feature was committed to PG-14, while they […]
pg_bulkload is a high-speed data loading tool for PostgreSQL, compared with the copy command. The biggest advantage is speed. In the direct mode of pg_bulkload, it will skip shared buffer and WAL buffer, writing files directly. It also includes the data recovery function to recover in case of import failure. GitHub: https://github.com/ossc-db/pg_bulkload pg_bulkload mainly […]
This is the second of the two-part series of blogs about PostgreSQL data corruption. Part-1 discussed identifying the data corruption in PostgreSQL and lists some DO’s and DON’Ts for avoiding that. In this part, I will attempt to walk you through possible ways to recover from the corruption and ways to salvage data in case of […]
Indexes are key database server features that enhance its performance to retrieve data faster than a complete table scan (unless the index is vastly bloated). They generally work by maintaining a smaller set of data in a more “searchable” structure as compared to the complete table data. This data organization comes at the cost of […]
Back in Nov 2021, I wrote about calling Stored procedure from Java and demonstrated how we can call a simple stored procedure from Java or a one that contains a IN parameter. I have done some more playing around with calling Stored procedure from Java so this blog is about calling a stored procedure with […]
There are many advantages to using bind variables in SQL statements, especially in conditional statements. Using bind variables can save the trouble to parse and optimize the query in every execution. There are also disadvantages such as bind variable will make optimizer lost its prediction accuracy and cause performance decrease. In PostgreSQL are represented by […]
A hierarchical query is built upon parent-child relationship, the relationship exist in the same table or view. The relationship dictates that each child can have one parent while a parent can have many children. Hierarchical query is a SQL query that handles data of hierarchical model i.e. an organisation structure where every employee has one […]
Bloating in database is created when tables or indexes are updated, an update is essentially a delete and insert operation. The diskspace used by the delete is available for reuse but it is not reclaimed hence creating the bloat. Same is the case with PostgreSQL database, frequent UPDATE and DELETE operations can leave a lot […]
A key aspect of maintaining a high performance database is continuous monitoring of system resources, as any issues anywhere on the system can potentially lead to downtime. There are options aplenty for monitoring, so reinventing the wheel doesn’t make much sense. For that purpose, I’ve selected a couple of options that are simple to deploy, […]
1. Overview PostgreSQL supports many System Information Functions, such as Session Information Functions, Access Privilege Inquiry Functions, Schema Visibility Inquiry Functions, System Catalog Information Functions, Transaction ID and Snapshot Information Functions, etc. However, you may want build some special functions and integrate them into the PostgreSQL. This blog is going to walk through the steps […]
1. Introduction As an experienced PostgreSQL user, you may have a lot of experience in setting up streaming replication in your database clusters to make multiple backups of your data. But have you wondered how the standby is able to correctly determine if a tuple sent from the primary should be visible to the user […]
PostgreSQL is one of the most advanced open source databases globally with a lot of great features. One of them is Streaming Replication which was introduced in PostgreSQL 9.0—known as Physical Replication. It works by sending the WAL segments generated by the primary PostgreSQL to another standby server also running PostgreSQL. However, if we only […]
I have spent sometime exploring all the SQL command changes in PostgreSQL 14, this blog will provide a brief summary about the SQL changes along with some examples. There are exciting new features that in progress for PG-14, this blog will only focus on the SQL changes that are committed and will be released with […]
Quite a few managers or clients expect DBAs to type away a few commands to fix a sluggish PostgreSQL instance. If only things were that simple. Unfortunately, debugging a sluggish instance that was previously running optimally requires analysis at many levels. In this blog, I will attempt to help that debugging process on Linux by […]
1. Introduction This blog talks about a high level description of the mechanism behind PostgreSQL to execute an INSERT query. This process involves many steps of processing before the data is put in the right place. These process normally involves several catalog cache lookup to determine if the destination table exists or several checking on […]
1. Overview PostgreSQL supports different type of replications, i.e. logical and physical, and there are many tutorials discussed about the replications. This blog is a simply walk-through of the WAL Streaming replication using the latest Postgresql-13 on Ubuntu 18.04. 2. Install Postgresql-13 from source code In this blog, we will install Postgresql from the source […]
All the things that can happen to databases, corruption is not the one that happens frequently, but when it does it can result in major downtime and data loss. Probably it is one of the most terrifying things for a DBA to come across and the best way to deal with it is to not […]
In computing, a hash table (hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value […]
1. Introduction There are many approaches for PostgreSQL to retrieve the data back to the user. Depending on the user’s input query, the planner module is responsible for selecting the most optimum approach to retrieve the requested data. Sequential scan is one of these approaches that is mostly selected when the user requests a large […]
A table in PostgreSQL has a relfilenode value, which specifies the file name of the table on disk (except foreign table and partition table). In general, this value can be found in the relfilenode field of the pg_class table, but there are some specific tables whose query result in the relfilenode field is 0. This […]
PostgreSQL database writes all changes happening in the database to a log file before it writes them to the actual data files. These log files are called WAL (Write-Ahead Log). Until these changes are flushed to the disk, they are kept in memory and is returned from the memory when it’s asked for. Writing to […]
1. Overview PostgreSQL is a great open source database developed and maintained by many great software engineers around the world. In each release, there are many features added to this open source database. For example, one feature is very helpful for developer is backtrace_functions introduced in PostgreSQL 13 , which allows a developer to dump […]
TDE is an important feature missing from PostgreSQL, it is also a crucial tick in the box which people are deciding to move to PostgreSQL from other DBMS systems. The community has been trying to add this feature in PG however it has taken too long because of lack of attention from serious community members […]
If you’ve seen the following error, you know that your CentOS6 has reached end of life (EOL). Loaded plugins: fastestmirror, ovl Setting up Install Process YumRepo Error: All mirror URLs are not using ftp, http[s] or file. Eg. Invalid release/repo/arch combination/ removing mirrorlist with no valid mirrors: /var/cache/yum/x86_64/6/base/mirrorlist.txt Error: Cannot retrieve repository metadata (repomd.xml) for […]
You have a file, possibly a huge CSV, and you want to import its content into your database. There are lots of options to do this but how would you decide which one to use. More often than not the question is how much time would the bulk load would take. I found my self doing the same […]
When using logical replication with PostgreSQL, the wal level needs to be set to ‘logical’, so the logical level wal contains more data to support logical replication than the replicate wal level. I think it’s unclear to many users or DBAs about the difference between logical and replicate level. This blog is about the difference […]
My previous blog on “PostgreSQL High Availability: Considerations and Candidates” mostly talked about defining an HA considerations for PostgreSQL, RPO and RTO and briefly touched on some of the open source solutions available. Granted that I didn’t list them all and people shared some additional ones that I may end up reviewing as well. In […]
This is my third blog about Stored Procedure support in PostgreSQL, the previous two blogs are reachable from the HighGo CA blogs site https://www.highgo.ca/author/ahsan-h/. The first blog was introduction and usage of Stored Procedures and its difference with Stored Functions, the second blog focussed on creating and using Procedures with Definer and Invoker rights. The […]
Database monitoring is a very essential for most of the database system, it can prevent system crises, improve performance, database failure and auditing the traffic for identify any unwanted access. PostgreSQL has an extension called pg_stat_statements, which can capture a lot of query execution details. Recently, percona release a new extension called pg_stat_monitor which can […]
1. Introduction In this blog post, I will talk about how to enable the generation of crash dump file (also known as core dump) and some common GDB commands to help a developer troubleshoot a crash-related issues within PostgreSQL and also other applications. Proper analysis of the issue normally will take time and certain degree […]
1. Overview Previously, we discussed the MAIN fork file and corresponding extension at Heap file and page in details. This blog will explain a little bit more about the Free Space Mapping file and corresponding extension. 2. What is a Free Space Mapping file A Free Space Mapping, FSM, file is a file that keeps […]
When inserting data into a table, you can specify that the value of a field is null. For example, for a table t (i int, j int, k int), we can execute insert into t values (8,1,6), or insert into t values (3, null, 7) to insert a record with a null value. This blog […]
1. Overview PostgreSQL is a great open source database, and many users chose it because of the efficiency of its central algorithms and data structures. As a software developer, I was always curious about how each part was done, such as the physical files storage. The reason is that I always see a lot of […]
Watchdog is the high availability component of Pgpool-II. Over the past few releases watchdog has gotten a lot of attention from the Pgpool-II developer community and received lots of upgrades and stability improvements. One of the not very strong areas of pgpool-II watchdog is its configuration interface. Watchdog cluster requires quite a few config settings […]
WAL is short for Write-Ahead-Log. Any change to the data is first recorded in a WAL file. The WAL files are mainly used by RDBMS as a way to achieve durability and consistency while writing data to storage systems. Before we move forward, let’s first see why we need a WAL archiving and Point in […]
The pgpool II community is gearing up to release the Alpha version of its next major release; pgpool II 4.2. It is going to be another exciting release of pgpool II that is a middleware product and provides mission critical functionality like load balancing, high availability, connection pooling etc for PostgreSQL server. We have written […]