Balancing Act: Achieving High Availability with HAProxy for PostgreSQL

Enterprise PostgreSQL Solutions

Comments are off

Balancing Act: Achieving High Availability with HAProxy for PostgreSQL

Introduction

This blog is aimed at beginners trying to learn the basics of PostgreSQL and HAProxy but already have some experience under their belt. For this tutorial, we will assume you have PostgreSQL correctly installed on Ubuntu. All of these steps were done using PostgreSQL 16 (development version) and HAProxy 2.6.9 on Ubuntu 23.04. We’ll go over an example of setting up multiple PostgreSQL servers, one primary and a few replicas, and how they can be balanced using HAProxy.

The Setup

For our setup, we want something like the image below.

A user will use psql to interact with the Postgres database. We’ll have 1 primary database to handle both read and write requests, and two replicas that will handle read requests as well as any failovers that may occur. HAProxy sits in the middle of the two acting as a load balancer. So when the user connects to the database, they do not know which specific database they have reached but it shouldn’t matter as long as all the data is replicated correctly.

We can then set this up with a few easy commands. First, we need to create the primary:

$ mkdir data1/ 
$ initdb data1/

Don’t forget to edit postgresql.conf and pg_hba.conf with the correct connection information. For this example, we are using port 10001 for the primary and 10002, 10003, 10004 for the replicas. Look at our previous blog here if you want to learn more about setting up replicas locally. Start the primary server once this configuration is done.

Now for our two replicas:

$ mkdir data2/ data3/ 
$ pg_basebackup -h localhost -p 10001 -U rep_user -X stream -C -S replica_1 -v -R -W -D $HOME/Documents/work/pg_test/data2/ 
$ pg_basebackup -h localhost -p 10001 -U rep_user -X stream -C -S replica_2 -v -R -W -D $HOME/Documents/work/pg_test/data3/

Don’t forget to also edit the configuration files of these replicas so they have the correct port number. Once you have done this feel free to start the standby servers and check that they are replicating the primary data.

All we need now is HAProxy. Here is an example of our HAProxy configuration file for what we are trying to implement:

global 
    stats socket ipv4@:9999 level admin 

defaults 
    mode tcp 
    timeout connect 20s 
    timeout client 10m 
    timeout server 10m 

frontend front 
    bind localhost:9998 
    use_backend back 

backend back 
    balance roundrobin 
    server-template pg 1-10 localhost:10001

Let’s look at this config file line by line:

  • Lines 1-2: This is the global section that applies to everything in our config. Here we have defined a statistics socket on port 9999 which will be used for HAProxy API commands in order to dynamically assign new servers.
  • Lines 4-8: This is the default section that sets what values should be used when no specifications are made. Here we define our default mode to be TCP (instead of HTTP), which means all of our connections will be TCP since that is what psql uses. We also define some timeouts which is mainly so we don’t get warning messages when starting HAProxy.
  • Lines 10-12: This is our frontend, bound to port 9998. Here HAPRoxy will listen for TCP connections on port 9998 and forward them to the backend named “back“.
  • Lines 14-16: This is our backend which has 10 servers defined and balanced by a round-robin algorithm. The 10 servers are defined using server-template, which means that we have 10 servers available to be dynamically allocated during HAProxy’s runtime. This number is important as it should be all the servers we plan to need for the lifetime of our application. Changing this number would involve editing the configuration and restarting HAProxy.

To check that our configuration is valid we can run the following:

$ haproxy -f /etc/haproxy/haproxy.cfg -c 
Configuration file is valid

Then to start HAProxy depends on your system. Since I am running Ubuntu 23.04 I have two choices.

$ systemctl start haproxy.service

or

$ service haproxy start

Both will work on Ubuntu, but make sure to select the correct method for your operating system.

Once HAProxy is running, we need to activate our backend servers by assigning them to the correct ports that have our databases. We can do this by sending an HAProxy runtime API command over TCP to the port we defined as the global stats port in our configuration file. Here we will be using socat to deliver the command.

$ echo "set server back/pg1 addr 127.0.0.1 port 10001" | socat stdio tcp4-connect:127.0.0.1:9999 
$ echo "set server back/pg2 addr 127.0.0.1 port 10002" | socat stdio tcp4-connect:127.0.0.1:9999 
$ echo "set server back/pg3 addr 127.0.0.1 port 10003" | socat stdio tcp4-connect:127.0.0.1:9999

This will assign 3 of our 10 servers to ports 10001, 10002, and 10003 which match exactly the ports we specified earlier would have our databases running on them. Now whenever HAProxy receives a TCP request, it will forward it to the backend which is a pool of 3 servers.

We can get a full list of commands that the HAProxy runtime API supports by running:

$ echo "help" | socat stdio tcp4-connect:127.0.0.1:9999

Dynamically add a new server

Now that we have our initial setup running, let’s say our select traffic shoots up and we need another server to handle it. Create another Postgres instance that will act as our third replica.

$ mkdir data4/
$ pg_basebackup -h localhost -p 10001 -U rep_user -X stream -C -S replica_3 -v -R -W -D $HOME/Documents/work/pg_test/data4/

Like before, we can send a message to a running HAProxy instance to add new servers without restarting it:

$ echo "set server back/pg4 addr 127.0.0.1 port 10004" | socat stdio tcp4-connect:127.0.0.1:9999

This will assign one of our 10 server templates to the assigned address and port. This gives us a total of 4 servers in our pool.

Use echo "show servers state" | socat stdio tcp4-connect:127.0.0.1:9999 to see our server states:

$ echo "show servers state" | socat stdio tcp4-connect:127.0.0.1:9999 
1 
# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl srv_check_port srv_check_addr srv_agent_addr srv_agent_port 
3 back 1 pg1 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 2 pg2 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10002 - 0 0 - - 0 
3 back 3 pg3 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10003 - 0 0 - - 0 
3 back 4 pg4 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10004 - 0 0 - - 0 
3 back 5 pg5 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 6 pg6 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 7 pg7 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 8 pg8 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 9 pg9 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0 
3 back 10 pg10 127.0.0.1 0 5 1 1 371 1 0 0 14 0 0 0 localhost 10001 - 0 0 - - 0

As we can see our 3 backends have been assigned.

Now we can use psql to manage our database and have our select queries distributed among our 4 servers.

Note that this setup only works if we want to distribute read requests through HAProxy , like SELECT statements. This is because HAProxy does not care about the contents of the TCP connection, it simply sees one a forwards it based on round robin. If we issue a write query that modifies data we run into a problem. Replica servers cannot serve these write requests and will return an error. Since 3 of our 4 servers are replicas it becomes obvious that this will not work unless we plan on never editing our data.

Improvements

Let’s look at some small improvements we can make to this example.

Separate Frontends

The first improvement I would want to make would be separating the primary and standby databases into different frontends so we get a port for each. Now we can use HAProxy to modify data by connecting to the primary frontend and serving all of our read requests through the replica frontend.

This HAProxy configuration might look like the following:

global 
    stats socket ipv4@:9999 level admin 

defaults 
    mode tcp 
    timeout connect 20s 
    timeout client 10m 
    timeout server 10m 

frontend primary 
    bind localhost:9997 
    use_backend primary_backend 

backend primary_backend 
    server pg-primary localhost:10001 

frontend replica 
    bind localhost:9998 
    use_backend replica_pool 

backend replica_pool 
    balance roundrobin 
    server-template pg-replica 1-10 localhost:10002

With two frontends we no longer need to activate our primary like before as it is an already defined server. Our replica pools can still be dynamically allocated, allowing us to add or remove servers based on demand.

Automation

The second improvement I would suggest would be automating the allocation process by developing our own program which monitors the database load. This program would be its own server that periodically checks the average load of all the replicas and decides based on this number whether or not to create a new Postgres database and add it to the replica pool. HAProxy’s runtime API makes this very easy as all we need is a TCP library which most languages have. We also need a way to create a new Postgres database which could be done using system commands if no PostgreSQL library exists for this language.

Conclusion

In this blog, we went over an example of dynamically adding new replica servers to our primary PostgreSQL database using HAProxy. Having the ability to replicate data across servers serves as not only a defense against data loss from crashes but also as a way to improve data availability. Being able to add replicas dynamically then becomes critical for any application that relies on said data being consistently accessible. In conclusion, if you’re designing some new software architecture, don’t forget to think about how replication can play into the robustness of your design.