1. Overview Followed my previous blog, Global Index, a different approach, we posted our initial Global Unique Index POC to Postgres community for open discussion about this approach. Now, this blog is trying to explain how the benchmark was performed using pgbench on this initial Global Unique Index POC. 2. Build global index Before running […]
1.0 Introduction My colleague, David, recently published a post “Global Index, a different approach” that describes the work that we are doing to implement global unique index in an approach that does not change current PostgreSQL’s partitioning framework, while allowing cross-partition uniqueness constraint. To implement this, we must first know how PostgreSQL currently ensures uniqueness […]
1. Overview A few years ago, there was a proposal about adding the global index support to PostgreSQL for partitioned table. Following that proposal, there were many discussions and also an initial version POC to demonstrate the possibility, the technical challenges and the potential benefits, etc. However, the global index feature is still not available […]
1.0 Introduction Recently I have been involved in creating a solution that guarantees cross-partition uniqueness within a partitioned table consisting multiple child tables. Some people refer to this feature as a global index and there has been some discussion about it in the email thread here. Though the idea is good, the approach sparks a […]
Introduction Network File System (NFS) is a distributed file system protocol that allows a user on a client node to access files residing on a server node over network much like local storage is accessed. Today in this blog, I will share how to set up both NFSv4 server and client on CentOS7 and run […]
1. Overview Similar to PostgreSQL, Lustre file system is also an open source project which started about 20 years ago. According to Wikipedia, Lustre file system is a type of parallel distributed file system, and is designed for large-scale cluster computing with native Remote Direct Memory Access (RDMA) support. Lustre file systems are scalable and […]
1. Introduction Recently in my development work, a custom connection is required to be maintained between a PG backend on primary and another PG backends on standby nodes to communicate custom data in addition to the existing walsender/walreceiver connection that streams WAL data. Of course, I could just create a new standalone backend and maintain […]
1. Overview PostgreSQL is a great open source project for many reasons. One of the reasons I like it is because of the design of buffer blocks addressing. In this blog, I am going to explain a possible way to share a Primary’s buffer blocks with a Standby. If you want to know more about […]
1. Introduction PostgreSQL’s MultiVersion Concurrency Control (MVCC) is an “advanced technique for improving database performance in a multi-user environment” according to Vadim Mikheev. This technique requires multiple “versions” of the same data tuple exist in the system governed by snapshots taken during different time periods. In other words, under such technique, it is PG’s responsibility […]
1. Overview PostgreSQL is a very popular open-source relational database management system, and it is widely used in many different production environments. To maintain the production environment always functioning, you need to a lot tools, and one of the tools must to have been backup and restore. This blog is going to introduce one backup […]
1. Introduction PostgreSQL’s 2 phase commit (2PC) feature allows a database to store the details of a transaction on disk without committing it. This is done by issuing PREPARE TRANSACTION [name] command at the end of a transaction block. When the user is ready to commit, he/she can issue COMMIT PREPARED [name] where [name] should […]
The MERGE statement is one of the long awaited features and it’s coming in the upcoming Major version of PostgreSQL 15. Although the PostgreSQL 15 release is quite a distance away, the MERGE statement patch has been committed to the development branch and it should become available as early as beta release around May, 2022. […]
1. Overview Nowadays, supporting distributed transactions is a typical requirement for many use cases, however, the global deadlock detection is one of the key challenging issues if you plan to use PostgreSQL to setup a distributed database solution. There are many discussions about global deadlock, but this blog will provide you a step-by-step procedure about […]
1. Overview PostgreSQL is one of the greatest open source databases, not only because of the extensibility and SQL compliance but also the evolution of new features. For example, in postgres_fdw, there is a new feature parallel commit has been added into the main branch and will be released in PG15. This blog is for […]
1. Introduction If you are into distributed database research, especially one that is setup using Foreign Data Wrapper (FDW) + partitioned foreign tables, you probably have heard that there are many potential issues associated with this setup. Atomic commit, atomic visibility and global deadlock detection are one of the most popular issues that one can […]
1. Overview Recently, I was working on an internal issue related with buffer manager in PostgreSQL, and I saw a typical use of the Lightweight lock in buffer manager like below. Basically, when the buffer manger needs to access a buffer block using buffer tag, it will have to acquire a lightweight lock in either […]
1. Introduction Last year I wrote a blog about PostgreSQL’s timeline concept, which is essential for executing Point In Time Recovery (PITR) back to a particular timeline and particular Log Sequence Number (LSN). But we have not talked about the idea of LSN in which everything else is built upon. Today in this blog, I […]
1. Overview I recently investigated one internal issue which was related with snapshot and found there were some changes on transaction id and snapshot information functions in PostgreSQL. Here, I am trying to share what I have learned. Before PostgreSQL 13, all transaction id and snapshot related public functions were named as txid_xxx_yyy, for example,txid_current(), […]
1. Introduction Few weeks ago, I was tasked to have a detailed look inside PostgreSQL’s extended query protocol and study its internal mechanisms for a project that depends on this particular feature. In this blog, I will explain how extended protocol works in my own words and how it differs from simple query. 2. Simple […]
1. Overview I have been working on an internal project based on PostgreSQL for a while, and from time to time, I need to run some specific test cases to verify my changes. Here, I want to shared a tip to run a specific regression TAP test quickly, especially, when you are focusing on a […]
1. Introduction In my previous blog here, I discussed about PostgreSQL’s point in time recovery where PostgreSQL supports an ability to recover your database to a specific time, recovery point or transaction ID in the past but I did not discuss in detail the concept of timeline id, which is also important in database recovery. […]
1. Overview When I was working on some backup and recovery related features for a project based on Postgres, I noticed that there is file called backup_label. By quickly google search, you can find some very nice blogs or books which discussed this topic, such as, The Internals of PostgreSQL, one of my favourite books. […]
1. Introduction Recently I have been practicing the internals of PostgreSQL continuous archiving and the point in time recovery features that are required for my current development work. Today I would like to demonstrate these important features with the recently released PostgreSQL 14 on Ubuntu 18.04. 2. Write Ahead Log? Before going into the details […]
1. Overview In my previous blog, I briefly walked through how the bulk/batch insertion was done for postgres_fdw in PG14. In this blog, I am going to run some basic tests to compare the performance for before and after the batch insertion was introduced in postgres_fdw, so that we can have a general idea about […]
1. Introduction Recently I have been tasked to familiarize myself with the Foreign Data Wrapper (FDW) interface API to build a new FDW capable of doing vertical / columnar sharding, meaning that the FDW is capable of collecting column information from multiple sources and combine them together as a result query. I will document and […]
Load balancing increases the system throughput from the application standpoint by distributing the read load from application to multiple replicated database nodes. Some time ago, I wrote a blog discussing effectiveness of Pgpool-II load balancer in different scenarios. Pgpool-II provides one of the best user-friendly load balancers for PostgreSQL that does not require any modification […]
1. Overview postgres_fdw has been existing in PostgreSQL for many years, and it was one of the most popular interfaces used in many extensions. such as the PostgreSQL foreign data wrappers wiki page, and PostgreSQL Extension Network/PGXN. However, I had not touched this postgres_fdw in deep until I got a task about combing multiple columns […]
So for one reason or another you have decided that you need horizontal scalability and you need to achieve it with PostgreSQL as your database. So what options do you have? The PostgreSQL database supports vertical scalability and can run on bigger and faster machines to increase the performance. But when it comes to horizontal […]
1. What Is Security? The word “Security” is a very broad concept and could refer to completely different procedures and methodology to achieve. Knowing what security means to your application is very important, so you could execute proper security practices and procedures to ensure the safety of your company’s assets. Data compromises could often lead […]
1. Overview I was working on the PostgreSQL storage related features recently, and I found PostgreSQL has designed an amazing storage addressing mechanism, i.e. Buffer Tag. In this blog, I want to share with you my understanding about the Buffer Tag and some potential usage of it. 2. Buffer Tag I was always curious to […]
During my most recent expedition of going through PostgreSQL source code, I ran into some mysterious variables in the page structure of some indexes. I will explain later why i am calling these mysterious but first have a look at these variables… typedef struct HashPageOpaqueData{ … uint16 hasho_page_id; /* for identification of hash indexes */} […]
1. Introduction I am working on a new PostgreSQL feature that redefines the way a tuple’s visibility status is determined. The feature is working very nicely until I start doing a large SELECT query, which triggers PostgreSQL to spawn multiple parallel workers to process the request. When this happens, the feature I am working on […]
I have spent a few days trolling through the features added to PostgreSQL 14 and in this blog, I want to share some statistics about PG-14 contributions, hopefully, you will find these statistics interesting. Please note that this data is based on my research on PG 14 contributions by going through the GIT LOG, commitfest […]
Back in August 2019, I wrote about the Horizontal scaling / Sharding in PostgreSQL and where it is going https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/. One of the key feature that i talked about in this blog is parallel foreign scan i.e. asynchronous append of FDW nodes. Finally the first patch of this feature was committed to PG-14, while they […]
pg_bulkload is a high-speed data loading tool for PostgreSQL, compared with the copy command. The biggest advantage is speed. In the direct mode of pg_bulkload, it will skip shared buffer and WAL buffer, writing files directly. It also includes the data recovery function to recover in case of import failure. GitHub: https://github.com/ossc-db/pg_bulkload pg_bulkload mainly […]
This is the second of the two-part series of blogs about PostgreSQL data corruption. Part-1 discussed identifying the data corruption in PostgreSQL and lists some DO’s and DON’Ts for avoiding that. In this part, I will attempt to walk you through possible ways to recover from the corruption and ways to salvage data in case of […]
Indexes are key database server features that enhance its performance to retrieve data faster than a complete table scan (unless the index is vastly bloated). They generally work by maintaining a smaller set of data in a more “searchable” structure as compared to the complete table data. This data organization comes at the cost of […]
Back in Nov 2021, I wrote about calling Stored procedure from Java and demonstrated how we can call a simple stored procedure from Java or a one that contains a IN parameter. I have done some more playing around with calling Stored procedure from Java so this blog is about calling a stored procedure with […]
There are many advantages to using bind variables in SQL statements, especially in conditional statements. Using bind variables can save the trouble to parse and optimize the query in every execution. There are also disadvantages such as bind variable will make optimizer lost its prediction accuracy and cause performance decrease. In PostgreSQL are represented by […]
A hierarchical query is built upon parent-child relationship, the relationship exist in the same table or view. The relationship dictates that each child can have one parent while a parent can have many children. Hierarchical query is a SQL query that handles data of hierarchical model i.e. an organisation structure where every employee has one […]
Bloating in database is created when tables or indexes are updated, an update is essentially a delete and insert operation. The diskspace used by the delete is available for reuse but it is not reclaimed hence creating the bloat. Same is the case with PostgreSQL database, frequent UPDATE and DELETE operations can leave a lot […]
A key aspect of maintaining a high performance database is continuous monitoring of system resources, as any issues anywhere on the system can potentially lead to downtime. There are options aplenty for monitoring, so reinventing the wheel doesn’t make much sense. For that purpose, I’ve selected a couple of options that are simple to deploy, […]
1. Overview PostgreSQL supports many System Information Functions, such as Session Information Functions, Access Privilege Inquiry Functions, Schema Visibility Inquiry Functions, System Catalog Information Functions, Transaction ID and Snapshot Information Functions, etc. However, you may want build some special functions and integrate them into the PostgreSQL. This blog is going to walk through the steps […]
1. Introduction As an experienced PostgreSQL user, you may have a lot of experience in setting up streaming replication in your database clusters to make multiple backups of your data. But have you wondered how the standby is able to correctly determine if a tuple sent from the primary should be visible to the user […]
PostgreSQL is one of the most advanced open source databases globally with a lot of great features. One of them is Streaming Replication which was introduced in PostgreSQL 9.0—known as Physical Replication. It works by sending the WAL segments generated by the primary PostgreSQL to another standby server also running PostgreSQL. However, if we only […]
I have spent sometime exploring all the SQL command changes in PostgreSQL 14, this blog will provide a brief summary about the SQL changes along with some examples. There are exciting new features that in progress for PG-14, this blog will only focus on the SQL changes that are committed and will be released with […]
Quite a few managers or clients expect DBAs to type away a few commands to fix a sluggish PostgreSQL instance. If only things were that simple. Unfortunately, debugging a sluggish instance that was previously running optimally requires analysis at many levels. In this blog, I will attempt to help that debugging process on Linux by […]
1. Introduction This blog talks about a high level description of the mechanism behind PostgreSQL to execute an INSERT query. This process involves many steps of processing before the data is put in the right place. These process normally involves several catalog cache lookup to determine if the destination table exists or several checking on […]
1. Overview PostgreSQL supports different type of replications, i.e. logical and physical, and there are many tutorials discussed about the replications. This blog is a simply walk-through of the WAL Streaming replication using the latest Postgresql-13 on Ubuntu 18.04. 2. Install Postgresql-13 from source code In this blog, we will install Postgresql from the source […]
All the things that can happen to databases, corruption is not the one that happens frequently, but when it does it can result in major downtime and data loss. Probably it is one of the most terrifying things for a DBA to come across and the best way to deal with it is to not […]