Deploy Pgpool on K8S as Load Balancer

Enterprise PostgreSQL Solutions

Comments are off

Deploy Pgpool on K8S as Load Balancer

Introduction

Following on my previous blog here, which outlines the procedure to deploy a metric server cron job to monitor an already deployed PostgreSQL primary and standby nodes on Kubernetes, this blog aims to show the procedure to deploy a pgpool node that is able to load balance write requests to the primary and read requests to 1 or more standby nodes.

We will focus on the deployment of pgpool today in this blog, not the Postgres primary and standby nodes, which are assumed to be already deployed. We will cover those in the next blog

The following is the architecture of the deployment:

We will write a manifest and a configmap yaml files to deploy pgpool and define a k8s service called pgpool-service of type nodePort that a psql client can access the pgpool from outside the cluster via port forwarding.

Prerequisite

You will need to install minikube that acts as a Kubernetes cluster to run the deployment, and kubectl to actually deploy.

Pgpool Deployment Manifest

pgpool-deploy.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgpool
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pgpool
  template:
    metadata:
      labels:
        app: pgpool
    spec:
      containers:
      - name: pgpool
        image: bitnami/pgpool:v4.4.1
        imagePullPolicy: Always
        #image: pgpool/pgpool
        env:
        - name: POSTGRES_USERNAME
          valueFrom:
            secretKeyRef:
              name: mypostgres-postgres-secret
              key: username
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mypostgres-postgres-secret
              key: password
        - name: PGPOOL_PASSWORD_ENCRYPTION_METHOD
          value: "scram-sha-256"
        - name: PGPOOL_ENABLE_POOL_PASSWD
          value: "true"
        - name: PGPOOL_SKIP_PASSWORD_ENCRYPTION
          value: "false"
        # The following settings are not required when not using the Pgpool-II PCP command.
        # To enable the following settings, you must define a secret that stores the PCP user's
        # username and password.
        - name: PGPOOL_PCP_USER
          valueFrom:
            secretKeyRef:
              name: pgpool-pcp-secret
              key: username
        - name: PGPOOL_PCP_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pgpool-pcp-secret
              key: password
        volumeMounts:
        - name: pgpool-config
          mountPath: /config
      volumes:
      - name: pgpool-config
        configMap:
          name: pgpool-config
---
apiVersion: v1
kind: Service
metadata:
  name: pgpool-service
spec:
  type: NodePort
  selector:
    app: pgpool
  ports:
  - name: pgpool-port
    port: 9999
    targetPort: 9999
    nodePort: 30000

In pgpool-deploy.yml, we define the PostgreSQL and Pgpool PCP credentials as secretKeyRef kubernetes objects so their values are not visibly on the .yml file. The secretKeyRef objects must be created prior to deployment as:

$ kubectl create secret generic mypostgres-postgres-secret --from-literal=username=cloud_admin --from-literal=password=cloud_admin
$ kubectl create secret generic pgpool-pcp-secret --from-literal=username=postgres --from-literal=password=postgres

The pgpool-deploy.yml will fetch the secret values from the above created secretKeyRef and assign them to the associated environment variables for pgool container: POSTGRES_USERNAME, POSTGRES_PASSWORD, PGPOOL_PCP_USER, PGPOOL_PCP_PASSWORD

Pgpool configmap Manifest

pgpool-configmap.yml

apiVersion: v1
kind: ConfigMap
metadata:
  name: pgpool-config
  labels:
    name: pgpool-config
data:
  pgpool.conf: |-
    listen_addresses = '*'
    port = 9999
    socket_dir = '/var/run/pgpool'
    pcp_listen_addresses = '*'
    pcp_port = 9898
    pcp_socket_dir = '/var/run/pgpool'
    backend_hostname0 = 'primary-service'
    backend_port0 = 5432
    backend_weight0 = 1
    backend_flag0 = 'ALWAYS_PRIMARY|DISALLOW_TO_FAILOVER'
    backend_hostname1 = 'standby-service'
    backend_port1 = 5432
    backend_weight1 = 1
    backend_flag1 = 'DISALLOW_TO_FAILOVER'
    sr_check_period = 0
    sr_check_user = 'postgres'
    health_check_period = 0
    health_check_user = 'postgres'
    backend_clustering_mode = 'streaming_replication'
    num_init_children = 32
    max_pool = 4
    child_life_time = 300
    child_max_connections = 0
    connection_life_time = 0
    client_idle_limit = 0
    connection_cache = on
    load_balance_mode = on
    ssl = on
    enable_pool_hba = off
    failover_on_backend_error = off
    log_min_messages = warning
    statement_level_load_balance = on
    log_statement = on
    log_per_node_statement = on

This configmap manifest will be populated into a pgpool.conf file to be loaded by pgpool. Note that we predefined the primary node as primary-service and standby node as standby-service. These correspond to the 2 kubernetes services of type clusterIP that brings traffic to postgres-primary and postgres-standby pods. Also note that we enabled statement_level_load_balancer, which makes pgpool forward the write traffic to the primary node and read only traffic to the standby nodes.

Since postgres-standby is deployed as a kubernetes deployment, it can be scaled up or down (deploy more or less replica pods). When this happens, there is no need to update pgpool’s configmap to include new standby nodes or exclude the old nodes. This is because kubernetes will automatically distribute traffic among all available standby nodes

deploy pgpool

When we are ready, we can deploy them by:

$ kubectl create secret generic mypostgres-postgres-secret --from-literal=username=cloud_admin --from-literal=password=cloud_admin
$ kubectl create secret generic pgpool-pcp-secret --from-literal=username=postgres --from-literal=password=postgres
$ kubectl apply -f pgpool-configmap.yml
$ kubectl apply -f pgpool-deploy.yml

$ kubectl get pod
NAME                                           READY   STATUS      RESTARTS   AGE
postgres-primary                               1/1     Running   0          1m
postgres-replica-deployment-69f995884b-72v6g   1/1     Running   0          9s
postgres-replica-deployment-69f995884b-kkkn5   1/1     Running   0          9s
pgpool-59b77dbc76-6nr45                        1/1     Running   0          1m

Observe Pgpool Load Balancing

Since we set log_statement = on, we can use psql to run some queries on pgpool and observe the redistribution of the incoming queries.

psql -d postgres -p 30000 -h [$clusterip] -U postgres

postgres=# show pool_nodes;
 node_id |         hostname         | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+--------------------------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | primary-service | 5432 | up     | unknown   | 0.500000  | primary | unknown | 13         | false             | 0                 |              |                        | 2023-06-16 18:31:43
 1       | standby-service | 5432 | up     | unknown   | 0.500000  | standby | unknown | 11         | true              | 0                 |              |                        | 2023-06-16 18:31:43
(2 rows)

postgres=# select count(*) from t1;
postgres=# select count(*) from t1;
postgres=# select count(*) from t1;
... 

tail pgpool log to observe the load balancing between node id 1 and 2, indicated by the DB node id: value in the log

$kubectl logs pgpool-59b77dbc76-6nr45 -f
2023-06-16 18:31:51.804: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:15.520: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:15.521: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:24.776: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:24.776: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;
2023-06-16 18:32:25.451: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:25.451: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:26.036: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:26.036: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:26.566: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:26.566: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:27.431: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:27.432: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:27.960: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:27.960: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;
2023-06-16 18:32:28.511: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:28.511: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: select count(*) from t1;
2023-06-16 18:32:29.007: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:29.007: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;
2023-06-16 18:32:35.295: psql pid 71: LOG:  statement: insert into t1 values(generate_series(1,10000), '555');
2023-06-16 18:32:35.296: psql pid 71: LOG:  DB node id: 0 backend pid: 40 statement: insert into t1 values(generate_series(1,10000), '555');
2023-06-16 18:32:36.397: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:36.398: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;
2023-06-16 18:32:37.252: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:37.252: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;
2023-06-16 18:32:38.052: psql pid 71: LOG:  statement: select count(*) from t1;
2023-06-16 18:32:38.052: psql pid 71: LOG:  DB node id: 1 backend pid: 34 statement: select count(*) from t1;

Reference

deploy pgpool on kubernetes