PostgreSQL Load Balancing Made Easy: A Deep Dive into pgpool

Enterprise PostgreSQL Solutions

Comments are off

PostgreSQL Load Balancing Made Easy: A Deep Dive into pgpool

Introduction

This blog is aimed at beginners trying to learn the basics of PostgreSQL and PGPool but already have some experience under their belt. For this tutorial, we will assume you have PostgreSQL correctly installed on Ubuntu. All of these steps were done using PostgreSQL 16 (development version) and PGPool 4.4.3 on Ubuntu 23.04. We’ll go over an example of setting up multiple PostgreSQL servers, one primary and two replicas, and how they can be balanced using PGPool.

Setting Up

Our overall architecture will be PGPool sitting in front of 3 PostgreSQL instances, 1 primary and 2 replicas. It will look something like this:

To get this setup we first have to create our primary database:

$ mkdir data/
$ initdb data/

Set up the primary as you would to allow for replication (edit the correct postgresql.conf and pg_hba.conf files). If you want to learn more about setting up database replication in PostgreSQL, see my previous blog here.

Once set up, start the database:

$ pg_ctl -D data/ -l logfile start

Follow these steps again to create both replicas and set them up to replicate the primary database.

Lastly, we need to start PGPool in order to pool our databases together. We’re going to use the following configuration file, pgpool.conf:

listen_addresses = 'localhost'
port = 9999

pcp_listen_addresses = 'localhost'
pcp_port = 9898

load_balance_mode = on

#Backend 0
backend_hostname0 = 'localhost'
backend_port0 = 5432 
backend_weight0 = 0.2
backend_data_directory0 = '/home/tristen/pg/data'
backend_flag0 = 'ALWAYS_PRIMARY'
backend_application_name0 = 'primary'

#Backend 1
backend_hostname1 = 'localhost'
backend_port1 = 5433 
backend_data_directory1 = '/home/tristen/pg/data1' 
backend_weight1 = 0.4
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_application_name1 = 'server1' 

#Backend 2
backend_hostname2 = 'localhost'
backend_port2 = 5434
backend_data_directory2 = '/home/tristen/pg/data2'
backend_weight2 = 0.4
backend_flag2 = 'ALLOW_TO_FAILOVER'
backend_application_name2 = 'server2' 

Let’s examine what this config file is doing:

  • Lines 1-2: Set the host address and port number that pgpool will be listening on.
  • Lines 4-5: Set the host address and port number that PCP will be listening on.
  • Line 7: Enable load balancing for pgpool.
  • Lines 10-15: Define various parameters for backend 0 (the primary), such as the host and port it can be found on, the directory where we can find PostgreSQL data files, the weight for this database, flags for the connection and application name .
  • Line 17-31: These are the same as lines 10-15 but define the other two backends (the replicas).

Once this is generated PGPool can be simply started by running:

$ pgpool

Now that we have everything running, let’s dive into the example.

Load Balancing

There are a few ways we can see load balancing in action. The easiest way would be to use a utility provided by PostgreSQL, namely pgbench. This tool is for benchmarking PostgreSQL databases and will run a large load of queries to test how a database handles traffic. We can use pgbench like so:

$ pgbench -i postgresql://postgres@localhost:9999/postgres
$ pgbench -S -c 10 -j 2 -t 100 postgresql://postgres@localhost:9999/postgres

The first command initializes the database at the specified host and port to contain the correct tables to run the benchmark. The second command runs the benchmark on a specified database. Flag -S is for select only, -c determines the number of simulated clients that will connect to the database, -j specifies the number of worker threads pgbench will use and -t is the number of transactions each client runs.

Once pgbench finishes, we’ll issue a query to the database to see how SELECT traffic was divided among the replicas and primary. Note that only read traffic can be balanced, any write requests have to be directed to the primary. PGPool knows this and will forward traffic accordingly so we don’t have to worry.

$ psql postgresql://postgres@localhost:9999/postgres -c "show pool_nodes;"
node_id | hostname  | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | localhost | 5432 | up     | unknown   | 0.200000  | primary | unknown | 101        | false             | 0                 |                   |                        | 2023-09-11 12:50:32
 1       | localhost | 5433 | up     | unknown   | 0.400000  | standby | unknown | 200        | false             | 0                 |                   |                        | 2023-09-11 12:50:32
 2       | localhost | 5434 | up     | unknown   | 0.400000  | standby | unknown | 700        | true              | 0                 |                   |                        | 2023-09-11 12:50:32
(3 rows)

The result of this query is a table with the rows representing all of our nodes. If we look into the select_cnt column, we can see the number of SELECT queries each database received. Each database should receive an amount of traffic proportional to its weight defined in pgpool.conf.

But what if we don’t need all of these replicas, and only want to add nodes when traffic goes up? Well, let’s see how we can make pgpool dynamic!

Make it Dynamic!

How will we complete this herculean task of adding and removing nodes at will during pgpool’s runtime? Thankfully pgpool implements a very interesting protocol called PCP, or the Port Control Protocol. This allows pgpool to listen for connections on the PCP socket and handle requests without needing to restart, so no downtime. Let’s see how we can use this protocol to control our pool of databases.

First, we need to set up some authentication for this protocol. To specify a username and password we must add it to a file called pcp.conf which is typically located in /usr/local/etc/. The contents of the file are in the following format:

[username]:[md5 password]

We can include multiple username and password pairs on each line of the file and PCP will try them in order. Let’s add a user “postgres” with the password “postgres”, this would look something like:

postgres:e8a48653851e28c69d0506508fb27fc5

We can use the command line tool md5sum to encrypt our password. For example:

$ echo -n "postgres" | md5sum
e8a48653851e28c69d0506508fb27fc5  -

Notice we have to use the -n flag for echo to leave out the newline or else our password won’t be able to be typed correctly.

Using the Port Control Protocol

Now that pgbench has terminated, there is currently no traffic to any of our databases. Having 2 replicas might be excessive especially if we are running with limited processing power. So let’s remove one of these nodes:

$ pcp_detach_node -h localhost -p 9898 -U postgres -n 2

Once run, you will be prompted for a password, enter the one we defined in pcp.conf (in my case it would be “postgres”). This command is sending a detach message to port 9898 (our PCP port defined in pgpool.conf) on localhost with user postgres (our PCP user we defined above). We also tell it to select node 2, which is our second replica. PGPool zero indexes nodes, so the main compute is node 0, replica 1 is 1 and so on.

Once this command is issued our second replica will be detached and will no longer be considered for sending traffic regardless of its defined weight. We can then shutdown or delete this database altogether depending on our needs.

Let’s check our pool nodes again:

$ psql postgresql://postgres@localhost:9999/postgres -c "show pool_nodes;"
 node_id | hostname  | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | localhost | 5432 | up     | unknown   | 0.200000  | primary | unknown | 502        | false             | 0                 |                   |                        | 2023-09-11 12:50:32
 1       | localhost | 5433 | up     | unknown   | 0.400000  | standby | unknown | 801        | true              | 0                 |                   |                        | 2023-09-11 12:50:32
 2       | localhost | 5434 | down   | unknown   | 0.400000  | standby | unknown | 700        | false             | 0                 |                   |                        | 2023-09-11 12:52:16
(3 rows)

We can see our nodes have changed, the status of node 2 has been updated to “down” from “up”. Also, the select_cnt for the third node has not changed and all of the traffic has gone to the first two.

Reattaching a node is a very similar process. Let’s say that we run pgbench again and our traffic shoots up. We want our replica back to take on some of the traffic from the first two nodes.

$ pcp_attach_node -h localhost -p 9898 -U postgres -n 2

Enter the password when prompted. This uses the same parameters as before since we are reattaching the same node. Once issued, let’s check our pool status again:

$ psql postgresql://postgres@localhost:9999/postgres -c "show pool_nodes;"
node_id | hostname  | port | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | localhost | 5432 | up     | unknown   | 0.200000  | primary | unknown | 703        | true              | 0                 |                   |                        | 2023-09-11 12:50:32
 1       | localhost | 5433 | up     | unknown   | 0.400000  | standby | unknown | 801        | false             | 0                 |                   |                        | 2023-09-11 12:50:32
 2       | localhost | 5434 | up     | unknown   | 0.400000  | standby | unknown | 1501       | false             | 0                 |                   |                        | 2023-09-11 12:54:00
(3 rows)

We can see that not only has the status of replica 2 changed to “up” but also its select_cnt column has increased, meaning it is a part of the pool and is being balanced.

More PCP commands to control PGPool can be found here.

Automating

Everything we’ve done so far has been manually adding and removing nodes from PGPool. Wouldn’t it be great if we could write a bash script that did all of this for us? Well since we have to enter a password for each PCP command a script is not feasible, right? This is where our pcppass file comes in.

Create a file called .pcppass either in the user’s home directory or reference a file with the $PCPPASSFILE environment variable that contains the username and password for our PCP user. This file also requires the file permission 0600 or else it will be ignored.

$ touch ~/.pcppass
$ chmod 0600 ~/.pcppass

Now add the username and password combination we had defined in pcp.conf. Note that this password is not an md5 hash like it was in pcp.conf and is instead represented in plain text by default. We also have to include the host and port that PCP is listening on. This format would look like so:

hostname:port:username:password

For our example, this would look like:

localhost:9898:postgres:postgres

With this set up we can use the -w flag which is common to all PCP commands. When specified this flag causes PCP commands to not prompt for a password when run. If the PCPPASSFILE is invalid or cannot be found, all PCP commands will automatically be rejected for insufficient permissions if the -w flag is used.

So let’s say our PCPPASSFILE is valid and exists, we can then use the following without being prompted for a password:

$ pcp_detach_node -w -h localhost -p 9898 -U postgres -n 2

We can add this to a script and have it trigger a node detach once some condition is met. Building this script is up to you, but it should be relatively simple once this condition logic is sorted out.

Conclusion

In this blog, we went over an example of dynamically adding replica servers to our PostgreSQL pool using PGPool. We first went over how to set up pgpool to load balance one primary database with two replicas. Then we saw how replicas can be dynamically added and removed from a balancing pool during runtime and how this process can be automated. Being able to change replicas on the fly with zero downtime for clients is essential for any application that requires high availability. So next time you’re building an application architecture, consider how database scaling could boost your throughput.