Consensus based failover with Pgpool-II

Enterprise PostgreSQL Solutions

1 comment

Consensus based failover with Pgpool-II

Pgpool-II probably is the most comprehensive clustering solution existing today for PostgreSQL. It provides a wide range of features like connection pooling, load balancing, automatic failover and high availability while using the Pgpool-II for load balancing and building a highly available PostgreSQL cluster is one of its most common use case.

Since Pgpool-II is a proxy server that sits between PostgreSQL and client applications so building a HA using Pgpool-II requires to ensure not only the redundancy of PostgreSQL servers ( primary and standby) but also multiple Pgpool-II servers are needed to make sure if one Pgpool-II fails another one should take over the responsibility to ensure that the database service remains unaffected.

To solve the SPOF of Pgpool-II service, Pgpool-II has a built-in watchdog module. Although the core function of this Pgpool-II module is to resolve the single point of failure of Pgpool-II node by coordinating multiple Pgpool-II nodes but it can also complement the Pgpool-II’s automatic failover and backend health checking.

This post is about the quorum aware and consensus based backend failover functionality of Pgpool-II watchdog.

Failover is an expensive operation

Failover is a mechanism in Pgpool-II to remove problematic PostgreSQL servers from the cluster and automatically promote the standby PostgreSQL to the new primary in case of primary database failure hence ensuring service continuation and avoiding any disruption.

Although failover is a basic functionality required to implement high availability, however, it must be avoided as much as possible. Reason being no matter how small the replication delay between primary and standby there is always a chance of losing some recent data in case of primary postgreSQL failover. Also we always lose the current sessions and currently running transactions at the time of failover. On top of that after the failover the cluster is left with one less database node and in most of the cases requires a manual intervention to reattach the failed node. So it is very important that we only do failover when it is absolutely necessary.

Consensus based failover

Failover is normally triggered when Pgpool-II’s backend health checking module reports the backend PostgreSQL server is unreachable. The health check module exposes many configuration parameters to configure the retry and timeouts for health in order to do proper due diligence before declaring the database server as dead. It works in most cases but still, it is not enough to cater to all possible problems that can happen in a network-based system.

For instance there is no way for a health check module of a standalone Pgpool-II server to identify if the failure is caused by a broken link between Pgpool-II and PostgreSQL server or if it is actually a database system failure. There could be many such cases where health-checking of Pgpool-II is not able to connect to totally healthy PostgreSQL server because of some localised issue between Pgpool-II and that PostgreSQL server while that server remains accessible from all other clients and servers.

To guard against such localized network failures, Version 3.7 added a quorum and consensus based backend failover feature to Pgpool-II watchdog. When this feature is enabled instead of acting on the health check reported failures the Pgpool-II node consults with other Pgpool-II nodes part of the watchdog cluster to validate the failure before proceeding with the failover.

The consensus based backed failover exposes four configuration parameters and can be enabled/disabled and fine tuned using those.


This is effectively an on/off switch to enable/disable the consensus and quorum based failover mechanism in Pgpool-II. When enabled, Pgpool-II will only act on the health-check failures when the watchdog cluster holds the quorum.

We can say that “quorum exists” if the number of live watchdog nodes (that is number of Pgpool-II nodes) can be a majority against the total number of watchdog nodes. For example, suppose the number of watchdog nodes is 5. If the number of alive nodes is greater than or equal to 3, then quorum exists. On the other hand if the number of alive nodes is 2 or lower, quorum does not exist.


This config enables/disables the consensus based failover. Setting this parameter to on makes the Pgpool-II to do an election to verify each health-check triggered failover request and the failover will only be performed when the majority of the pgpool-II nodes part of the watchdog cluster agrees on the failure.

For example, in a three node watchdog cluster, the failover will only be performed until at least two nodes ask for performing the failover on the particular backend node.


This parameter is basically a negation of true democracy but still can be useful in certain cases.

It works in connection with the failover_require_consensus, when enabled, a single Pgpool-II node can cast multiple votes for the failover and force the Pgpool-II’s hand to do failover even when the majority doesn’t agree.

Pgpool-II node casts a vote in favor of failover at every health-check failure.For example, consider the health-check is configured to run after every ten seconds, then in case of a persistent failure Pgpool-II will cast a vote for failover at each health-check failure (After every 10 seconds in this case). But all the subsequent votes after the first one will be ignored.However, if allow_multiple_failover_requests_from_node is enabled then Pgpool-II will consider every vote, and consequently, when the number of votes for the failover becomes equal to the required majority count the failover will be triggered.

Allow_multiple_failover_requests_from_node is useful to detect a persistent error that might not be found by other watchdog nodes.


This one configures how the majority rule computation is made by Pgpool-II for calculating the quorum and resolving the consensus for failover.

When enabled the existence of quorum and consensus on failover requires only half of the total number of votes configured in the cluster. Otherwise, both of these decisions require at least one more vote than half of the total number of votes. In both cases, whether making a decision of quorum existence or building the consensus on failover this parameter only comes into play when the watchdog cluster is configured with an even number of Pgpool-II nodes.

For example, with enable_consensus_with_half_votes enabled in a two-node watchdog cluster, only one alive Pgpool-II node is enough for the quorum existence. otherwise, both nodes must be alive to complete the quorum.

More on failover consensus

Quarantined backend nodes.

Pgpool-II requires all the attached backend nodes to be reachable at all times to work properly and failed backend nodes need to be detached from it so that it can stop sending SQL to those backend nodes. But when a backend PostgreSQL server becomes unreachable from one of the Pgpool-II server while the rest of the Pgpool-II nodes part of the watchdog cluster disagrees on that failure then that PostgreSQL backend gets quarantined (Only on Pgpool-II that is not able to connect to the backend) until it becomes reachable again or consensus in made.

When the master node fails to build the consensus for standby backend node failure, it takes no action and similarly quarantined standby backend nodes on watchdog-master do not trigger a new leader election.

Quarantine nodes are effectively the same as detached backed but they do not cause the failover ( In case of primary node quarantined no standby gets promoted) and they get attached back to the cluster as soon as they become reachable again.

Consensus based failover key considerations

Consensus based failover is a very powerful and useful feature but you need to set it right to make it work properly otherwise it can cause more problems than it solves.

Health check must be enabled on all Pgpool-II nodes.

Building a consensus for failover requires at least a majority number of nodes to agree on a backend node failure. Since Pgpool-II relies on its health checking to detect the backend failure, so if health-check is not enabled on all Pgpool-II nodes in the cluster or the health-check interval and timeouts are not consistent across the cluster then building a consensus would take longer time than what is desired and could also cause the cluster to never reach to a consensus, even in case of a genuine node failure.

Use an odd (minimum 3) number of Pgpool-II nodes

As with every other distributed system building a consensus requires a clear majority so to ensure the quorum and consensus mechanism works properly, the number of Pgpool-II nodes must be odd in number and greater than or equal to three. Although Pgpool-II still tries to reach a consensus with an even number of nodes in watchdog clusters, it is prone to split-brain in some rare scenarios.


If the quorum exists, Pgpool-II could work better on failure detection because even if a watchdog node mistakenly detects a failure of a backend node, it would be denied by other major watchdog nodes.

Quorum aware and consensus based failover is a very powerful and useful feature that can drastically improve the robustness and reliability of Pgpool-II failover and help guard against the false alarms and temporary network glitches. Moreover, it adds the second line of defense against false failovers so effectively we can set lower values for health check retry, and in case of a genuine failure, we can proceed with failover a lot quicker. This takes away the worry of false alarms and network glitches hence improving the overall availability of the system. 

One Response

  1. muhammad usama satria says:


    Thanks for the explanation.
    Just wan to confirm, if INITIALLY, I setup a 3-node Pgpool (node-0, node-1, and node-2 with enable_consensus_with_half_votes=on).
    When I shutdown node-0, Pgpool master (and virtual ip) is successfully assigned to node-1.
    So now on left node-1 and node-2.
    But then when I shutdown node-1, Pgpool master (and virtual ip) is not assigned to node-2.

    log message in node-2:
    I am the cluster leader node but we do not have enough nodes in cluster
    waiting for the quorum to start escalation process

    Is this the expected behavior?
    I thought by setting enable_consensus_with_half_votes=on, will make the Pgpool master (and virtual ip) assigned to node-2.


Leave a Reply

Your email address will not be published. Required fields are marked *