1. Overview
This blog describes a very experimental load balance using pgpool and neon compute nodes which are based on Postgres. There are many load balance discussions using different Postgres distributions, but they are mainly based on shared-nothing. The load balance setup discussed here is based on shared-storage using neon serverless solution.
2. Setup storage and compute nodes
To better understand the setup process, this blog will demonstrate the setup using binary built on Ubuntu 22.04, and we can discuss the docker build setup in a separate blog if more people are interested in this topic.
2.1. Build binaries
First, refer to Building on Linux to install all the build dependencies. Before building neon binaries, you need to build protobuf to support optional parameter. Here are the steps I used to workaround the build issues:
$ git clone https://github.com/protocolbuffers/protobuf.git
$ cd protobuf
$ git submodule update --init --recursive
$ cmake -Dprotobuf_BUILD_SHARED_LIBS=ON .
$ cmake --build . --parallel 4
$ sudo make install
$ protoc --version
libprotoc 22.3
After installing protobuf, follow Building on Linux to recursively clone the repo, and then checkout a special commit/release. You may not be able to set up the hot standby replica if you use a different commit.
$ git clone --recursive https://github.com/neondatabase/neon.git
$ cd neon
$ git checkout remotes/upstream/releases/2023-05-02 -b hot-standby-replica
$ make -j`nproc` -s
2.2. Start storage and Primary compute nodes
The storage and compute nodes build will take a while. After all binaries build successfully, then run the below commands:
$ ./target/debug/neon_local init
$ ./target/debug/neon_local start
$ ./target/debug/neon_local tenant create --set-default
$ ./target/debug/neon_local endpoint start main
$ ./target/debug/neon_local endpoint list
After the above commands, you will have a main branch started as the Primary Postgres compute node.
2.3. Prepare settings for hot standby
Since neon shared-storage serverless relies on a specification file to manage the compute nodes at this point, we must duplicate the original spec.json and make slight modifications to enable the activation of a hot-standby node.
$ cp -pr ./docker-compose/compute_wrapper/var/db/postgres/specs/spec.json spec-hotsb.json
$ curl -sb --request GET \
--url http://127.0.0.1:9898/v1/tenant \
--header 'accept: application/json' | jq
[
{
"id": "15f0247fde7e6d8e6125c44f7bd17ba1",
... ...
$ tenant_id="15f0247fde7e6d8e6125c44f7bd17ba1"
$ curl -sb --request GET \
--url http://127.0.0.1:9898/v1/tenant/$tenant_id/timeline \
--header 'accept: application/json' | jq
[
{
"tenant_id": "15f0247fde7e6d8e6125c44f7bd17ba1",
"timeline_id": "a40a56c38bb581456089dd33b5379cf8",
... ...
We now have access to both the tenant_id and timeline_id, which can be used to make modifications to the spec-hotsb.json file. The following shows the changes that need to be made:
$ diff ./docker-compose/compute_wrapper/var/db/postgres/specs/spec.json spec-hotsb.json
43c43
< "value": "55433",
---
> "value": "55434",
103c103
< "value": "safekeeper1:5454,safekeeper2:5454,safekeeper3:5454",
---
> "value": "127.0.0.1:5454",
108c108
< "value": "TIMELINE_ID",
---
> "value": "a40a56c38bb581456089dd33b5379cf8",
113c113
< "value": "TENANT_ID",
---
> "value": "15f0247fde7e6d8e6125c44f7bd17ba1",
118c118
< "value": "host=pageserver port=6400",
---
> "value": "host=127.0.0.1 port=64000",
120a121,130
> {
> "name": "hot_standby",
> "value": "on",
> "vartype": "bool"
> },
> {
> "name": "primary_conninfo",
> "value": "dbname=postgres user=cloud_admin host=127.0.0.1 port=55432 sslmode=disable gssencmode=disable target_session_attrs=any",
> "vartype": "string"
> },
The modifications to spec-hotsb.json mainly include starting the hot standby on a different port, configuring the IP and port for the safekeeper and pageserver, specifying the tenant_id and timeline_id used by the primary node, enabling hot_standby mode, and adding primary_conninfo.
2.4. Start hot standby compute node
In the neon shared-storage serverless solution, the compute node is stateless, and you may need to configure the replication in pg_hba.conf
using psql console on Primary if you run the hot-standby on a different machine. The following commands can be used for reference in such a case, but they are not necessary if you run everything on the same machine.
$ echo -e "\nhost\treplication\tall\t\t0.0.0.0/0\t\ttrust" >> .neon/endpoints/main/pgdata/pg_hba.conf
$ psql -h localhost -U cloud_admin -p 55432 -c "select pg_reload_conf();" postgres
$ psql -h localhost -U cloud_admin -p 55432 -c "select pg_read_file('pg_hba.conf');" postgres
Create the pgdata
structure exactly as shown below, and then use compute_ctl
to start the hot-standby.
$ mkdir -p .neon/endpoints/hotsb/pgdata
$ ./target/debug/compute_ctl \
--pgdata .neon/endpoints/hotsb/pgdata \
-C "postgresql://cloud_admin@localhost:55434/postgres" \
-b ./pg_install/v14/bin/postgres \
-S spec-hotsb.json
You may see an error like the one below:
ERROR could not start the compute node: db error: ERROR: cannot execute ALTER ROLE in a read-only transaction
... ...
INFO shutting down
Don’t worry, run the command ps -ef |grep postgres
to check if you have the hot standby running or run the endpoint list command below.
$ ./target/debug/neon_local endpoint list
ENDPOINT ADDRESS TIMELINE BRANCH NAME LSN STATUS
hotsb 127.0.0.1:55434 a40a56c38bb581456089dd33b5379cf8 main 0/169AFA0 running
main 127.0.0.1:55432 a40a56c38bb581456089dd33b5379cf8 main 0/169AFA0 running
2.5. Verify Primary and hot standby setup
Let’s run a simple query to verify the hot standby is working before we continue to pgpool setup. First, run the following commands to insert some records.
$ psql -p55432 -h 127.0.0.1 -U cloud_admin postgres -c 'create table t(a int, b int);'
$ psql -p55432 -h 127.0.0.1 -U cloud_admin postgres -c 'insert into t values(generate_series(1, 10), generate_series(1, 10));'
Next, run the following two queries against Primary and hot-standby, and you should have the same output.
$ psql -p55432 -h 127.0.0.1 -U cloud_admin postgres -c 'select count(*) from t;'
$ psql -p55434 -h 127.0.0.1 -U cloud_admin postgres -c 'select count(*) from t;'
3. Setup pgpool2
If all the above setups have been completed successfully, you can proceed with setting up pgpool2 from the source code.
3.1. Build pgpool2
Building a stable release pgpool from the source code is a simple process. You can follow these steps:
$ git clone https://github.com/pgpool/pgpool2.git
$ git checkout V4_4_STABLE
$ cd pgpool2
$ ./configure
$ make
3.2. Start pgpool2
Copy pgpoolconf.sample
as pgpool.conf
and add the following configuration to the end of pgpool.conf
:
port = 9999
pcp_port = 9998
statement_level_load_balance = on
sr_check_user = 'cloud_admin'
health_check_user = 'cloud_admin'
recovery_user = 'cloud_admin'
wd_lifecheck_user = 'cloud_admin'
Then, add the following parameters to pgpool.conf
for the load balance test:
backend_hostname0 = '127.0.0.1'
backend_port0 = 55432
backend_weight0 = 0.1
backend_data_directory0 = '/home/david/sandbox/neon/.neon/endpoints/main/pgdata'
backend_application_name0 = 'main'
# backend_hostname1 = '127.0.0.1'
# backend_port1 = 55434
# backend_weight1 = 0.3
# backend_data_directory1 = '/home/david/sandbox/neon/.neon/endpoints/hotsb/pgdata'
# backend_application_name1 = 'hotsb'
Since we want to manually scale the compute nodes up and down through the pcp interface, we need to create a pcp.conf
file and add the following two lines to set postgres
as the password:
cat pcp.conf
postgres:e8a48653851e28c69d0506508fb27fc5
# USERID:MD5PASSWD
Finally, you can start pgpool with a command like the following to observe the log:
./pgpool2/src/pgpool -f pgpool.conf -F pcp.conf -n -D -C
3.3. Verify pgpool2
To verify that pgpool is running properly, run the following command:
$ psql -p9999 -h 127.0.0.1 -U cloud_admin postgres -c 'show pool_nodes;'
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 127.0.0.1 | 55432 | up | unknown | 1.000000 | primary | unknown | 0 | true | 0 | | | 2023-05-12 14:31:48
(1 row)
4. Load balancing
To test the load balancing configuration, we will use pgbench to generate some load on the system. Follow these steps: Initialize the database with pgbench and run 10 queries using a for loop.
$ pgbench -i -s 2 -p 9999 -U cloud_admin -h 127.0.0.1 postgres
$ port=9999
$ for i in {1..10}
do
echo "Query $i"
psql -U cloud_admin -h 127.0.0.1 -c "SELECT min(aid), max(aid) FROM pgbench_accounts" -p $port postgres
done
After running the queries, check the pgpool status again to verify that all 10 queries were assigned to the primary node:
$ psql -p9999 -h 127.0.0.1 -U cloud_admin postgres -c 'show pool_nodes;'
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 127.0.0.1 | 55432 | up | unknown | 1.000000 | primary | unknown | 10 | true | 0 | | | 2023-05-12 14:31:48
(1 row)
You should see that all 10 queries were assigned to the primary node.
4.1 Scale Up
Assuming there’s a lot of load at the moment and you want to shift some read-only load to the hot-standby, follow these steps:
- Uncomment the
backend_xxxx1
related settings inpgpool.conf
. - Reload the
pgpool.conf
file with the following command:
$ ./pgpool2/src/pgpool -f pgpool.conf reload
- Attach the
backend1
to pgpool with passwordpostgres
using the following command:
$ ./pgpool2/src/tools/pcp/pcp_attach_node -h 127.0.0.1 -U postgres -p9998 -n 1
Run the following commands to repeat the 10 queries:
$ port=9999
$ for i in {1..10}
do
echo "Welcome $i times"
psql -U cloud_admin -h 127.0.0.1 -c "SELECT min(aid), max(aid) FROM pgbench_accounts" -p $port postgres
done
After the queries finish, check the pgpool
status again. You should see that the hot-standby helped handle some read-only queries, as shown below:
$ psql -p9999 -h 127.0.0.1 -U cloud_admin postgres -c 'show pool_nodes;'
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 127.0.0.1 | 55432 | up | unknown | 0.250000 | primary | unknown | 12 | false | 0 | | | 2023-05-12 14:31:48
1 | 127.0.0.1 | 55434 | up | unknown | 0.750000 | standby | unknown | 8 | true | 0 | | | 2023-05-12 15:26:24
(2 rows)
4.2. Scale down
In order to scale down, assuming there is not much load on the system, we can detach backend_1 from pgpool by modifying the pgpool.conf file and reloading it. The following commands can be used in sequence for this purpose:
./pgpool2/src/tools/pcp/pcp_detach_node -h 127.0.0.1 -U postgres -p9998 -n 1
vim pgpool.conf
./pgpool2/src/pgpool -f pgpool.conf reload
After this, run 10 queries and check the pgpool status. The result should show that backend1 is down, and the select_cnt remains the same. The following command can be used to check the pgpool status:
$ psql -p9999 -h 127.0.0.1 -U cloud_admin postgres -c 'show pool_nodes;'
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+-----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | 127.0.0.1 | 55432 | up | unknown | 0.250000 | primary | unknown | 22 | true | 0 | | | 2023-05-12 14:31:48
1 | 127.0.0.1 | 55434 | down | unknown | 0.750000 | standby | unknown | 8 | false | 0 | | | 2023-05-12 15:30:55
(2 rows)
5. Summary
in this blog post, we demonstrated a simple load balancing setup using pgpool and Neon Compute nodes. We manually scaled up and down to shift read-only load to a hot-standby and remove it when the load decreased. Based on the test results, you can try to automate these processes. Overall, this load balancing setup can help improve database performance and increase availability.
A software developer specialized in C/C++ programming with experience in hardware, firmware, software, database, network, and system architecture. Now, working in HighGo Software Inc, as a senior PostgreSQL architect.
Recent Comments