Load balancing increases the system throughput from the application standpoint by distributing the read load from application to multiple replicated database nodes. Some time ago, I wrote a blog discussing effectiveness of Pgpool-II load balancer in different scenarios.
Pgpool-II provides one of the best user-friendly load balancers for PostgreSQL that does not require any modification of the client or application side and provides seamless load-balancing on a single client connection.
In this post, I will discuss the configuration options provided by Pgpool-II to setup
and tune the load-balancer
The settings of Pgpool-II load balancing can be categorized into five categories
- Basic settings and load Balancing modes
- Setting the load distribution ratios
- Specifying queries that shouldn’t be load balanced
- On-demand routing options
- Tuning the behavior in transactions
This is the first in a series of two-part blogs that will outline options for the first two categories.
Basic settings and load Balancing modes
This category of configuration parameters deals with enabling/disabling the load balance feature and configuring the load balancing modes.
load_balance_mode (boolean config)
This is the main switch of the Pgpool-II load balancer. Turning it on will begin all the exciting things.
Load balancing modes
When load balancing is enabled, Every user session to the PostgreSQL server through Pgpool-II uses two DB nodes to serve the client requests. One is the primary server (for WRITE statements) while the other is load_balance_node
which is used for sending the read queries for distributing the load.
Before Pgpool-II version 4.1, the load balancing node was selected only at the session start time, and during the life span of the user session the same load-balancing node was used.
One of the major enhancements of Pgpool-II Version 4.1 was the introduction of statement_level_load_balance
feature. It enables Pgpool-II to select a new load balance node for every statement.
Session level load balancing mode
This is the default and the classical load balancing mode in which the same load-balancing node is used throughout the user session.
Setting statement_level_load_balance to off will make the Pgpool-II load balancer work in this mode.
Statement level load balancing mode
Setting statement_level_load_balance to on will make the Pgpool-II select a new load balance node for each read statement.
Which mode is best for your application?
However, very small it may be, but selecting a load balancing node consumes the CPU cycle and has its own overheads. So selecting the right load balancing mode is important. Some of the considerations for selecting the mode could be.
- If the cluster has only one primary and one standby, there is no need to enable statement-level load balancing.
- If the setup has more than one read replicas and the application has its own connection pooling then statement-level load balancing is the best choice.
- If the applications in the setup normally issue a few queries for each session and create a lot of sessions, then session-level load balancing mode is the right way to go. While for batch processing and long user sessions, statement-level load balancing performs better.
Identifying which backend node is in use for load balancing
Pgpool-II provides ‘show pool_nodes‘ command that displays the information about all configured DB nodes. true
in ‘load_balance_node
‘ column for particular backend node means the DB node is destined for READ queries for the current session.
postgres=# show pool_nodes;
node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | localhost | 5432 | up | unknown | 0.333333 | primary | primary | 0 | false | 0 | | | 2021-08-20 16:07:29
1 | localhost | 5444 | up | unknown | 0.333333 | standby | primary | 0 | true | 0 | | | 2021-08-20 16:07:29
2 | localhost | 5555 | up | unknown | 0.333333 | standby | primary | 0 | false | 0 | | | 2021-08-20 16:07:29
(3 rows)
Similarly, If you execute show pool_nodes
repeatedly from the same session with statement-level load balancing enabled, different load_balance_node will appear after every SELECT statement.
postgres=# pgpool show statement_level_load_balance; statement_level_load_balance ------------------------------ on (1 row) postgres=# show pool_nodes; node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | localhost | 5432 | up | unknown | 0.333333 | primary | primary | 2 | true | 0 | | | 2021-08-20 16:07:29 1 | localhost | 5444 | up | unknown | 0.333333 | standby | primary | 0 | false | 0 | | | 2021-08-20 16:07:29 2 | localhost | 5555 | up | unknown | 0.333333 | standby | primary | 0 | false | 0 | | | 2021-08-20 16:07:29 (3 rows) postgres=# SELECT 'change the load balancer node'; ?column? ------------------------------- change the load balancer node (1 row) postgres=# show pool_nodes; node_id | hostname | port | status | pg_status | lb_weight | role | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change ---------+-----------+------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+--------------------- 0 | localhost | 5432 | up | unknown | 0.333333 | primary | primary | 2 | false | 0 | | | 2021-08-20 16:07:29 1 | localhost | 5444 | up | unknown | 0.333333 | standby | primary | 1 | true | 0 | | | 2021-08-20 16:07:29 2 | localhost | 5555 | up | unknown | 0.333333 | standby | primary | 0 | false | 0 | | | 2021-08-20 16:07:29 (3 rows) postgres=#
Setting the load distribution ratios
To distribute the read load among the standby DB nodes, Pgpool-II selects one of the backend nodes as a load balance node. That node can be used throughout the user session for ‘session level load balancing
‘ mode or can be replaced by a new load balancing node for each read statement when statement-level load balancing is enabled. The selection of the load balance node is influenced by the backend_weight config value for each node.
Although the backend_weight parameter accepts any positive integer value, all the backend_weights eventually gets normalized on the scale of [0 to 1]
For example:
If you have 2 nodes configured and each node has backend_weight = 1, that would mean the normalized weight for each node will be 0.5 and both will get 50% of SELECT queries. Similarly assigning the backend_weight =1 to each node in a three-node cluster will make the normalized weight for each node to 0.3. Effectively each node will receive 33% of the READ traffic.
Deciding backend weights
One important point to note here is that the primary server node can also be chosen as a load balancing node by pgpool-II, depending on the backend_weight of the primary. So for sessions where the primary node gets chosen as a load balance node all the read and write traffic will end up on the same server.
Selecting the correct weight ratio is very important to ensure the load is evenly distributed to get the maximum throughput. Since the primary server has an additional responsibility of handling all the WRITE requests, so assigning a smaller backend_weight value for the primary server gives better performance.
Selecting the right backend weight also depends on the type of workload. For example, if you are designing a system that has, say, more than 30 percent of write load, then setting the backend_weight for primary to 0 (Do not use PRIMARY as a load balancing node) yields better performance. Even for less write-intensive workloads offloading the primary from serving the read load are a good option.
Other than that if the hardware or network specifications of DB nodes are not homogeneous and some standby nodes or primary node is running on a higher spec machine, then you may consider distributing the load accordingly to get the best overall throughput.
Another important aspect to keep in mind while configuring the backend_weight is that Pgpool-II selects the load balance node randomly based on the backend_weight values (Selection is not round-robin or LRU based), and especially for the session-based load balancing mode the read load distribution may appear little off compared to the backend weights in short term, but it eventually catches up, So you might want to inspect the effect of load balancing after a significant number of queries have been sent from multiple sessions.
Conclusion
This blog post discussed the Pgpool-II load balancing setting related to the selection of load balancer nodes and configuring the load distribution ratio along with the optimum values for related configuration parameters.
In the next part, I will focus on the more granular aspects of load balancing options provided by Pgpool-II to configure advanced routing options and controlling the behaviors with transactions to ensure we don’t read outdated data from standby within the transaction.
Muhammad Usama is a database architect / PostgreSQL consultant at HighGo Software and also Pgpool-II core committer. Usama has been involved with database development (PostgreSQL) since 2006, he is the core committer for open source middleware project Pgpool-II and has played a pivotal role in driving and enhancing the product. Prior to coming to open source development, Usama was doing software design and development with the main focus on system-level embedded development. After joining the EnterpriseDB, an Enterprise PostgreSQL’s company in 2006 he started his career in open source development specifically in PostgreSQL and Pgpool-II. He is a major contributor to the Pgpool-II project and has contributed to many performance and high availability related features.
Recent Comments