Setting Up a PostgreSQL Replica Server Locally

Enterprise PostgreSQL Solutions

Comments are off

Setting Up a PostgreSQL Replica Server Locally

Getting Started

This blog is aimed at beginners who want to practice the fundamentals of database replication in PostgreSQL but who might not have access to a remote server. I believe it is essential when learning a new technology to go through examples on one’s own machine in order to solidify concepts. This can be difficult with replicas as many of the resources readily available assume the user has some PostgreSQL experience and access to another server on which to run the replica. We will not assume any of that here, the only prerequisite is that you already have Postgres installed and can login to the interactive shell.

Prerequisites

For these examples, we used Ubuntu 22.10 to run PostgreSQL 16 (development version). We are going to use as our data directory $HOME/pg/data. This first example will go over physical replication using streaming, followed by a shorter example with physical replication using shipping, then an example of logical replication, ending with a summary of the different replication types and their features.

Physical Replication With Streaming

Let’s start by logging into the primary server as a superuser, typically this is “postgres” the default user.

psql -U postgres

Once in the shell, create a new role for the replication, create a table in the primary server, insert some data into this table and then logout.

postgres=# CREATE ROLE rep_user WITH REPLICATION LOGIN PASSWORD 'rep_pass';
postgres=# create table t1(a int, b int);
postgres=# insert into t1 values (1,2);
postgres=# \q

Open $HOME/pg/data/postgres.conf and ensure

listen_addresses = 'localhost'

Then, append the following to the end of $HOME/pg/data/pg_hba.conf

host  replication   rep_user  localhost   md5

Run the following command to create a backup of the primary server which will then be used as the data directory for our replica server (located at $HOME/pg/rep).

$ pg_basebackup -h localhost -U rep_user -X stream -C -S replica_1 -v -R -W -D $HOME/pg/rep

After this directory has been created, open $HOME/pg/rep/postgres.conf and set the port number to something other than the PostgreSQL default 5432.

port = 5433

While still in postgres.conf edit the line primary_conninfo to appear something like the following:

primary_conninfo = 'dbname=postgres user=postgres host=localgost port=5432 sslmode=disable'

The last thing we must do before starting the servers is to create a dummy standby.signal file in the secondary server’s data folder. Enter the following in the command line:

$ touch $HOME/pg/rep/standby.signal

Open up a second terminal as we are going to start two separate instances of PostgreSQL on our host machine, each on different ports.

In the first terminal, our primary server, enter:

pg_ctl -D $HOME/pg/data start

In the second terminal, our replica server, enter:

pg_ctl -D $HOME/pg/rep start

Each should give an output stating that the server is started. We can confirm this by typing:

$ ps -aux | grep postgres
tristen    29088  0.0  0.1 175596 18580 ?        Ss   15:42   0:00 /home/tristen/disk/pgapp/bin/postgres -D /home/tristen/pg/data
tristen    29089  0.0  0.0 175728  2388 ?        Ss   15:42   0:00 postgres: checkpointer 
tristen    29090  0.0  0.0 175752  2364 ?        Ss   15:42   0:00 postgres: background writer 
tristen    29092  0.0  0.0 175596  7424 ?        Ss   15:42   0:00 postgres: walwriter 
tristen    29093  0.0  0.0 177196  4892 ?        Ss   15:42   0:00 postgres: autovacuum launcher 
tristen    29094  0.0  0.0 177176  4984 ?        Ss   15:42   0:00 postgres: logical replication launcher 
tristen    29112  0.0  0.1 175596 18456 ?        Ss   15:42   0:00 /home/tristen/disk/pgapp/bin/postgres -D /home/tristen/pg/rep
tristen    29113  0.0  0.0 175728  2564 ?        Ss   15:42   0:00 postgres: checkpointer 
tristen    29114  0.0  0.0 175596  2564 ?        Ss   15:42   0:00 postgres: background writer 
tristen    29115  0.0  0.0 176348  4488 ?        Ss   15:42   0:00 postgres: startup recovering 000000010000000000000014
tristen    29116  0.0  0.0 176136  3664 ?        Ss   15:42   0:00 postgres: walreceiver streaming 0/14000110
tristen    29117  0.0  0.0 177336  7060 ?        Ss   15:42   0:00 postgres: walsender rep_user 127.0.0.1(39650) streaming 0/14000110
tristen    29126  0.0  0.0  17580  2252 pts/2    S+   15:42   0:00 grep --color=auto postgres

We can see we have two processes of $HOME/disk/pgapp/bin/postgres running with the two directories.

Now, we will connect to each of the two instances. First the primary server on our first terminal:

psql -U postgres -p 5432

Then the replica server on our second terminal:

psql -U postgres -p 5433

The -p flag is used to specify the port. Using this on the primary server is technically redundant since it uses the default port. However, this is good to show that we are indeed connecting to two different ports and thus two different instances of PostgreSQL.

If we enter \d into the Postgres shell, we can get a list of tables that are in the database. Doing so in the second terminal corresponding to our replica server we can see that our table t1 was indeed copied over.

postgres=# \d
        List of relations
 Schema | Name | Type  |  Owner   
--------+------+-------+----------
 public | t1   | table | postgres
(1 row)

Let’s create a new table on the primary server to show that it will be copied over automatically to the replica.

On our first terminal enter:

postgres=# create table t2(c int, d text);
CREATE TABLE
postgres=# insert into t2 values (3, 'hello');
INSERT 0 1

In the second terminal enter:

postgres=# select * from t2;
 c |   d   
---+-------
 3 | hello
(1 row)

We can see that not only was the new table copied over from the primary server but so was the data we just inserted.

It is important to note that this relationship is only one way. That is, the replica is specifically a copy of the primary server. If we tried to create a table in the replica server we would get an error stating:

ERROR:  cannot execute CREATE TABLE in a read-only transaction

This follows logically as we want the replica to be read-only to make the data from the primary server more readily available.

Physical Replication With Shipping

Shipping has a similar setup to streaming, we mostly just change some config files. That being said, on the primary server edit postgresql.conf

wal_level = replica
archive_mode = on
archive_command = 'cp %p /path/to/archive/%f'

Restart the server if it is running and create a backup of the primary server:

$ pg_basebackup -h localhost -U rep_user -X fetch -v -R -W -D $HOME/pg/rep

Open $HOME/pg/rep/postgresql.conf and edit two lines. Comment out the line starting with archive_command and uncomment the line starting with restore_command adding the command to use when restoring after the equality operator. The lines should look like the ones below:

#archive_command = 'cp %p /path/to/archive/%f'
restore_command = 'cp /path/to/archive/%f %p'

Startup both servers like in the streaming example and you should see the replica copying the primary. Note that shipping changes may not appear immediately as they do in streaming. Streaming has higher latency and only sends changes when WAL files are filled to the replicas.

Logical Replication

Logical replication like physical has the same beginning setup with the servers. It involves some different configs but also some manual edits for tables to work properly.

On the primary server edit postgresql.conf

wal_level = logical

Make sure that pga_hba.conf on the primary server contains the correct connection permissions for the replication user.

host     all     rep_user     localhost     md5

Once all the configurations are ready, we want to start both servers and login to the interactive shell. Recall that each server is using a different port number in our example, so make sure you aren’t using the same server twice! In each server’s respective shell, we want to create a table in each with the same name and schema. If the tables are at all different, the replica will not be able to find it and we will get an error. That being said, enter the following in both terminals:

postgres=# create table t1(id int, val text);

Now that we have the same empty table in both databases, we can move on to replicating the table. On the primary server enter the following command to create a publication for table t1:

postgres=# CREATE PUBLICATION pub_t1 FOR TABLE t1;

This creates a publication for table t1 which means that the table t1 will be replicated to the secondary server. We can have as many tables as we want to replicate as part of the publication.

Once the publication is created, we can subscribe table t1 in the secondary database to table t1 in the primary database. To do this we must create a subscription in the secondary database as follows:

postgres=# CREATE SUBSCRIPTION sub_t1 CONNECTION 'dbname=postgres host=localhost port=5432 user=postgres' PUBLICATION pub_t1;

Note, that for this example we used both the default user postgres and the default database postgres, this is not necessary.

Now that we are subscribed we will see all updates in table t1 in the primary server appear in table t1 of the secondary server. Any other table on the primary server not part of the publication we made will not be copied over to the replica. Let’s see an example of that now, in the primary database terminal insert some data into table t1:

postgres=# insert into t1 values (1, 'hello');
INSERT 0 1
postgres=# insert into t1 values (2, 'world');
INSERT 0 1

Now let’s see if the table was actually replicated in the secondary server:

postgres=# select * from t1;
 id |  val  
----+-------
  1 | hello
  2 | world
(2 rows)

And there we have it, the data was successfully copied over to the replica server. But how do we know this is actually doing anything different than Physical replication? Well lets add a new table to our primary database and insert some data:

postgres=# create table t2(c int, d int);
CREATE TABLE
postgres=# insert into t2 values(3,4);
INSERT 0 1
postgres=# insert into t2 values(5,6);
INSERT 0 1

Let’s check the replica:

postgres=# select * from t2;
ERROR:  relation "t2" does not exist
LINE 1: select * from t2;

We can see that table t2 was not copied over since it does not have a subscription and therefore does not exist in our replica. This allows us to expose certain tables from a central database to remote ones while keeping other tables private.

But how do alterations to the table translate in a replica? Well let’s see. In the primary database terminal lets alter the table and add some data:

postgres=# alter table t1 add column x int;
ALTER TABLE
postgres=# insert into t1 values (3, 'foo', 42);
INSERT 0 1
postgres=# insert into t1 values (4, 'bar', 10);
INSERT 0 1
postgres=# select * from t1;
 id |  val  | x  
----+-------+----
  1 | hello |   
  2 | world |   
  3 | foo   | 42
  4 | bar   | 10
(4 rows)

Now checking the secondary server:

postgres=# select * from t1;
 id |  val  
----+-------
  1 | hello
  2 | world
(2 rows)

We can see that our secondary server is no longer replicating the primary. Even if we add new data in the same schema to the primary it wont show up in the replica:

postgres=# insert into t1 values (15, 'test');
INSERT 0 1
postgres=# select * from t1;
 id |  val  
----+-------
  1 | hello
  2 | world
(2 rows)

Since we changed the schema of t1 our replica no longer copies data from it since it is now effectively a different table. This is important to remember if you ever change a table’s schema and have replicas running somewhere else.

What happens if we don’t create a table in our replica that matches one in our primary server?

postgres=# CREATE PUBLICATION pub_t2 FOR TABLE t2;
CREATE PUBLICATION
postgres=# CREATE SUBSCRIPTION sub_t2 CONNECTION 'dbname=postgres host=localhost port=5432 user=postgres' PUBLICATION pub_t2;
ERROR:  relation "public.t2" does not exist

We get an error stating that the relation we want to subscribe to doesn’t exist. This can be very confusing as we gave it all the connection information and we know table t2 exists at that location. However, this error is referring to the fact it cannot find table t2 on our replica and therefore cannot create a subscription. So it’s important to remember, we need both databases to contain the same table with the same schema for logical replication to work properly.

Summary

Below is a summary of all the topics we discussed above from a high-level view. If you are still trying to decide what replication method you want to use then this summary could be helpful in making a decision.

Logical vs Physical Replication

LogicalPhysical
Works at the row level, copies changes to individual database rows from the primary to the replica servers. With logical replication, we can choose what tables, schemas or even columns to replicate giving us more granularity.Works at the disk block level, copying data from the primary server to the replicas. This allows it to be much more efficient with both time and space.
Because logical works at the row level we can make backups between different PostgreSQL versions and even between PostgreSQL instances running on different operating systems.We don’t get the granularity of logical replication since the whole database cluster is copied.
Logical replication supports bi-directional or multi-master replication setups.We also need the same PostgreSQL version running on the same operating system to make replicas.
Logical replication is less efficient than physical replicationPhysical replication has two versions, shipping and streaming
Comparison of Logical and Physical Replication

Shipping vs Streaming Replication

StreamingShipping
Sends changes from the primary server to the replicas as they occur. This leads to lower lag and ensures replica is up to date with the primary serverNear real-time, doesn’t send changes as they occur but instead sends WAL files after they are filled or a configurable timeout is reached. Leads to higher latency.
Continuously sends WAL records to the replicas which replay them to stay in synchWorks at the file level, WAL files are archived on the primary server and copied to the replicas. This is done via manual transfer or a script, the WAL files are then replayed to stay in sync with the primary server.
More efficient and less resource intensive doesn’t require copying or archiving WAL filesLess efficient and more resource intensive
Requires persistent connectionNo persistent connection is required, good for environments with spotty connections or high latency.
Replicas are read-only. Useful for load balancing queries, high availability and backups but not for writing.Replicas can be configured to be read-only or read-and-write.
Comparison of Streaming vs Shipping Physical Replication

Asynchronous vs Synchronous Streaming

AsynchronousSynchronous
Replicas are allowed to fall behind the primary server if the primary is busier. If the primary crashes we lose the data that was not replicated.The primary server does not commit until it receives confirmation that a replica has received the transaction. We never lose data if the primary crashes, however, this method can slow down the primary or even stop it if the replica has problems. Also, has a performance impact due to network latency.
Comparison of Asynchronous vs Synchronous Streaming

Conclusion

This concludes this article on setting up server replication with PostgreSQL for a variety of replication methods. First, we went over the actual steps to set up and run each method and then viewed a high-level summary of each method’s features.

References

27.2. log-shipping standby servers. PostgreSQL Documentation. (2023, February 9). Retrieved March 29, 2023, from https://www.postgresql.org/docs/current/warm-standby.html

31.11. quick setup. PostgreSQL Documentation. (2023, February 9). Retrieved March 29, 2023, from https://www.postgresql.org/docs/current/logical-replication-quick-setup.html

B, A. (2022, April 8). [web log]. Retrieved March 29, 2023, from https://scalegrid.io/blog/comparing-logical-streaming-replication-postgresql/.

Levinas, M. (2022, October 10). [web log]. Retrieved March 29, 2023, from https://www.cherryservers.com/blog/how-to-set-up-postgresql-database-replication.

Ravoof, S. (2023, February 17). [web log]. Retrieved March 29, 2023, from https://kinsta.com/blog/postgresql-replication/.