Using NGINX as a PostgreSQL Reverse Proxy and Load Balancer

Enterprise PostgreSQL Solutions

Comments are off

Using NGINX as a PostgreSQL Reverse Proxy and Load Balancer

Introduction

This blog will go over the use cases of setting up NGINX (Engine X) as both a reverse proxy and load balancer for PostgreSQL. NGINX is an excellent feature-rich and open-source project that can help make an application more cohesive. It does this by exposing a single port to the user and proxy requests to different applications in a project based on the URI. This can make it appear as though our entire application is hosted on a single node while in reality it can be distributed anywhere around the globe.

For the examples below we assume that the PostgreSQL server has some HTTP server sitting in front of it that can accept REST API requests and translate them to PostgreSQL-compatible statements.

NGINX Reverse Proxy

Fig 1. NGINX Reverse Proxy Overview

Figure 1. shows a block diagram with an overview of how NGINX will proxy different requests to different applications. The file nginx.conf controls the configuration parameters of NGINX. It is typically located in /etc/nginx/nginx.conf. To create a proxy like in Figure 1. above we could add the following to our nginx.conf:

events {
	worker_connections 1024;
}

http {
       server {
              listen localhost:55432;

              location / {
                     proxy_pass localhost:5433;
              }

              location /auth {
                     proxy_pass 8080;
              }
              
              location /stats {
                     proxy_pass 8081;
              }
       }
}

Once the configuration is set run:

$ sudo nginx -s reload

This will reload NGINX and apply the changes we’ve made.

From top to bottom, let’s go over all the pieces of the NGINX configuration file.

First, at the very top is the “events” context, this is used to set global parameters that dictate how NGINX runs at a general level. Within this context is the worker_connections parameter, this dictates how many simultaneous connections an NGINX process can have.

Next is the “http” context, which contains directives that tell NGINX how to handle HTTP connections. Nested within the http context is the “server” context, each server context defines a new virtual server. Each virtual server has a “listen” directive that defines a port and IP that the virtual server will listen to for connections. Within the server context, we have finally, the “location” context. The location context defines a path to match and contains directives within it. If an HTTP request matches one of the location paths then that location context’s directives are processed. For example, for location “/stats” any request to localhost:55432/stats will be passed on to port 8081 where our statistic server is running. The location “/” matches any location that has not already been matched since NGINX searches from the most to least specific URI.

Overall, what this NGINX configuration does is:

  • Passes any requests to localhost:55432/auth to the authentication server running on localhost:8080
  • Passes any requests to localhost:55432/stats to the statistics server running on localhost:8081
  • Forwards all other requests to the PostgreSQL HTTP server running on port 5433

NGINX Load Balancer

We can further add to this configuration by creating some PostgreSQL replica servers to distribute the load of read requests.

Fig 2. NGINX With Replica PostgreSQL Servers

In Figure 2. we now have the primary server with 2 replica servers included in our block diagram. For applications that are very read-intensive, this can be very useful as any of our write requests can go to the primary while any read requests can be divided amongst all 3 servers. Our new configuration then might look something like this:

events {
	worker_connections 1024;
}

http {
       server {
              listen localhost:55432;

              location / {
                     proxy_pass localhost:5433;
              }

              location /select {
                     proxy_pass backend;
              }

              location /auth {
                     proxy_pass 8080;
              }
              
              location /stats {
                     proxy_pass 8081;
              }

              upstream backend {
                     server localhost:5433;
                     server localhost:5434;
                     server localhost:5435;
              }
       }
}

The main differences between this configuration and the previous are the additional location context and a new “upstream” context. This location context matches HTTP requests to the location “/select” which we might assume is to issue a SELECT statement on the PostgreSQL server. These SELECT statements are then passed to something referred to as “backend”. If we look to the bottom we can see our “upstream” context named “backend” which contains 3 servers listening on 3 separate ports. This upstream context defines a pool of servers that are balanced by default using the Round-Robin algorithm. If we wanted to use a different algorithm we can define it as such:

upstream backend {
                     least_conn;
                     server localhost:5433;
                     server localhost:5434;
                     server localhost:5435;
              }

This tells the pool to use the server with the fewest active connections. Overall, this new configuration sends any SELECT statements (ie. read requests) to any of the servers based on which had the fewest connections and sends all other requests (ie. write requests) to the primary server.

The Stream Context

The stream context is special as it is not built into NGINX by default and must be enabled with the --with-stream flag if built from source and loaded into the configuration file. Like the http context, the stream context also balances traffic but instead of HTTP requests it balances TCP or UDP traffic making it much more versatile. Until this point, we have been assuming that a REST server is sitting in front of the PostgreSQL database in order to handle HTTP requests and translate them into PostgreSQL commands. Now we do not need such a server since PostgreSQL can handle TCP packets. If we wanted to replicate our configuration above but with stream instead of http we could do the following:

load_module /usr/lib/nginx/modules/ngx_stream_module.so;

events {
	worker_connections 1024;
}

http {
       server {
              listen localhost:55433;

              location /auth {
                     proxy_pass 8080;
              }
              
              location /stats {
                     proxy_pass 8081;
              }
       }
}

stream {
       
       server {
              listen localhost:55432 so_keepalive=on;

              proxy_pass http://localhost:5433;
       }

       server {
              listen localhost:55433;

              proxy_pass http://backend;
       }

       upstream backend {
              server localhost:5433;
              server localhost:5434;
              server localhost:5435;
       }
}

With this configuration, we now have 3 separate virtual servers that NGINX is running. The first is the same HTTP server as before that simply forwards HTTP requests to either the authentication or statistics HTTP server. The other 2 are stream servers that take in TCP traffic on ports 55432 and 55433. Port 55432 is forwarded to the primary server and port 55433 is load balanced among the replicas and primary server. In this setup a user could send write requests to the replica servers, however, they would be rejected as replicas are read-only. Also with this configuration, we can use psql to connect to a PostgreSQL database through NGINX.

$ psql -U postgres -d postgres -p 55432

Even though our PostgreSQL server is listening on port 5433, we can connect to port 55432 with psql and still be connected to the correct server since all the traffic is proxied. Similarly, if we issue SELECT statements through psql to port 55433, each of the replicas and primary will take turns answering the request since it is still load balanced through round-robin.

Conclusion

In this blog, we went over multiple features of NGINX that can help to both expand your project and also make it more cohesive. We first went over setting up NGINX as a reverse proxy to pass HTTP requests to different modules in a project. Then we went over setting up NGINX as a load balancer for both HTTP and TCP connections. Overall, NGINX is an incredibly useful piece of software that anyone making a distributed application should consider.

References

Ellingwood, J. (2022, December 1). Understanding the nginx configuration file structure and configuration contexts. DigitalOcean. https://www.digitalocean.com/community/tutorials/understanding-the-nginx-configuration-file-structure-and-configuration-contexts

NGINX. (n.d.). TCP and UDP Load balancing. NGINX Docs. https://docs.nginx.com/nginx/admin-guide/load-balancer/tcp-udp-load-balancer/

Paul, G. (2022, November 21). Configure a reverse proxy for PostgreSQL with Nginx. Medium. https://iamgideon.medium.com/configure-a-reverse-proxy-for-postgresql-with-nginx-63c18cefe09