Blog

Enterprise PostgreSQL Solutions

While managing a small team of development resources working on PostgreSQL development, I sometimes get resources in my team that have good development experience but are new to PostgreSQL. I have developed a short set of training instructions in order to get these resources started with PostgreSQL and get them to familiarise themselves with Postgres and its internals. The purpose of this blog is to share these instructions so it can benefit others in a similar situation. The instructions involve

1. Overview In this article we will use GDB debugger to trace the internals of Postgres and observe how an input query passes through several levels of transformation (Parser -> Analyzer -> Rewriter -> Planner -> Executor) and eventually produces an output. This article is based on PG12 running on Ubuntu 18.04, and we will use a simple SELECT query with ORDER BY , GROUP BY, and LIMIT keywords to go through the entire query processing tracing. 2. Preparation GDB

1. Overview Postgres is a huge database system consisting of a wide range of built-in data types, functions, features and operators that can be utilized to solve many common to complex problems. However, in the world full of complex problems, sometimes these are just not enough depending on the use case complexities. Worry not, since Postgres version 9, it is possible to extend Postgres’s existing functionalities with the use of “extensions” In this article, I will show you how to

1. Overview Table partitioning is introduced after Postgres version 9.4 that provides several performance improvement under extreme loads. Partitioning refers to splitting one logically large table into smaller pieces, which in turn distribute heavy loads across smaller pieces (also known as partitions). There are several ways to define a partition table, such as declarative partitioning and partitioning by inheritance. In this article we will focus on a simple form of declarative partitioning by value range. Later in this article, we

I have been working with the PostgreSQL community recently to develop TDE (Transparent Data Encryption). During this time, I studied some cryptography-related knowledge and used it to combine with the database. I will introduce the TDE in PostgreSQL by the following three dimensions. The current threat model of the databaseEncryption policy description and current design status of the current PostgreSQL communityFuture data security What is TDE? Transparent Data Encryption (often abbreviated to TDE) is a technology employed by Microsoft, IBM and Oracle to encrypt database files. TDE offers encryption at file level. TDE solves the problem of protecting data at rest, encrypting databases both on the hard drive and consequently on backup media.–Transparent_Data_Encryption When it comes to cryptography-related topics, we must first understand what security threats

Key management consists of four parts: key generation, key preservation, key exchange, and key rotation. Key Generation Only for the study of symmetric encryption, so I mainly introduce symmetric encryption. The symmetric password generation method is: A random number is a keyPassword-based key generationHKDF (HMAC-based extraction and extended key derivation) A random number is a key Random number as a key: Using a powerful random number generator to generate the key, this is easy to understand. Password-based key generation Password-based

The world’s most valuable resource is no longer the oil or gold, but data. And at the heart of data lies the database which is expected to store, process and retrieve the desired data as quickly as possible. But having a single database server doesn’t mostly serve the purpose. A single server has its drawbacks with a huge risk of data loss and downtime. So most data platforms use multiple replicated database servers to ensure high availability and fault tolerance

After introducing the difference between the AES modes, in this document, I will put the results about the AES modes performance. The following tests just use one core CPU. AES-NI:The Advanced Encryption Standard Instruction Set (or Intel Advanced Encryption Standard New Instructions, AES-NI for short) is an extension of the x86 instruction set architecture for Intel and AMD microprocessors, presented by Intel in March 2008. [1] The purpose of this instruction set is to improve the speed at which applications

PostgreSQL Community is working on adding built-in sharding in Postgres and it seems that the community is following the FDW based sharding approach for adding built-in sharding for PG. The Parallel Foreign Scan is a very important part of this approach as it will provide the functionality of executing multiple foreign scans in parallel and really improving the OLAP use case. I want to touch on how parallel foreign scan may be implemented in Postgres in this blog. Firstly I

Logical Replication appeared in Postgres10, it came along with number of keywords like ‘logical decoding’, ‘pglogical’, ‘wal2json’, ‘BDR’ etc. These words puzzle me so much so I decided to start the blog with explaining these terms and describing the relationships between them. I will also mention a new idea for a tool called ‘logical2sql’, it is designed for synchronization of data from Postgres to databases of different kinds and architectures. Difference Between Physical And Logical Replication Replication (Or Physical Replication)

Replication is a critical part of any database system that aims to provide high availability (HA) and effective disaster recovery (DR) strategy. This blog is aimed at establishing the role of replication in a database system. The blog will give a general overview of replication and its types as well as an introduction to replication options in PostgreSQL. The term Replication is used to describe the process of sharing information between one or more software or hardware systems; to ensure reliability,

Built-in Sharding Architecture The build-in sharding feature in PostgreSQL is using the FDW based approach, the FDW’s are based on sql/med specification that defines how an external data source can be accessed from the PostgreSQL server. PostgreSQL provides number of foreign data wrapper (FDW’s) that are used for accessing external data sources, the postgres_fdw is used for accessing Postgres database running on external server, MySQL_fdw is used for accessing MySQL database from PG, MongoDB_fdw is used for accessing MongoDB and

Declarative Partitioning So far we have discussed scalability, what is scalability, why and when you need and what are the different types of scalability. Now we are starting to get into the meat of this topic and will discuss declarative partitioning and sharding in PostgreSQL. The sharding functionality is being laid on top of declarative partitioning functionality in PostgreSQL. Declarative partitioning was released in PostgreSQL 10, prior to declarative partitioning PostgreSQL was using table inheritance and plpgsql triggers for providing table

I recently had the opportunity of taking part in couple of interesting talks on the future of sharding in PostgreSQL. The first talk was delivered by Bruce Momjain in PostgreSQL conf Ottawa (May 2019) in which he presented the future of Sharding in PostgreSQL and talked about current state and future of built-in sharding in PostgreSQL. The second talk was presented by myself in PostgreSQL conference in Beijing (China – July 2019) in which i also talked about the current

Recently, I did some work with Sawada-san on the TDE. So I studied on the encryption algorithm. So far, I study five modes in the AES. In this document, I will introduce the difference in the five kinds of mode. General The block ciphers are schemes for encryption or decryption where a block of plaintext is treated as a single block and is used to obtain a block of ciphertext with the same size. Today, AES (Advanced Encryption Standard) is

How Extensibility Works PostgreSQL is extensible because its operation is catalog-driven. PostgreSQL stores much more information in its catalogs: not only information about tables and columns, but also information about data types, functions, access methods, and so on. These tables can be modified by the user, and since PostgreSQL bases its operation on these tables, this means that PostgreSQL can be extended by users. The PostgreSQL server can moreover incorporate user-written code into itself through dynamic loading. The 3 directories

I will start with providing some explanation for the blog title before i dive into the blog itself. The title of this blog is that “Pgpool II 4.1 taking the bull by its horn”, the “bull” in this case is Pgpool II performance and “taking it by its horn” means that 4.1 is performance release and it aims to add significant performance improvement to Pgpool II. The question that often comes up is the performance difference when working with PostgreSQL

Just-in-time compilation of SQL statements is one of the new features in PostgreSQL 11. Just-in-Time compilation (JIT) is the process of turning some form of interpreted program evaluation into a native program and doing so at runtime. While doing some testing and benchmarking with JIT it appears that PostgreSQL 11 is about 29.31% faster at executing the TPC-H Q1 query than PostgreSQL 10. You could get details from https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/. The JIT technique is implemented using LLVM. What is the LLVM? The