Getting Started
This blog is aimed at beginners who want to practice the fundamentals of database replication in PostgreSQL but who might not have access to a remote server. I believe it is essential when learning a new technology to go through examples on one’s own machine in order to solidify concepts. This can be difficult with replicas as many of the resources readily available assume the user has some PostgreSQL experience and access to another server on which to run the replica. We will not assume any of that here, the only prerequisite is that you already have Postgres installed and can login to the interactive shell.
Prerequisites
For these examples, we used Ubuntu 22.10 to run PostgreSQL 16 (development version). We are going to use as our data directory $HOME/pg/data
. This first example will go over physical replication using streaming, followed by a shorter example with physical replication using shipping, then an example of logical replication, ending with a summary of the different replication types and their features.
Physical Replication With Streaming
Let’s start by logging into the primary server as a superuser, typically this is “postgres” the default user.
psql -U postgres
Once in the shell, create a new role for the replication, create a table in the primary server, insert some data into this table and then logout.
postgres=# CREATE ROLE rep_user WITH REPLICATION LOGIN PASSWORD 'rep_pass';
postgres=# create table t1(a int, b int);
postgres=# insert into t1 values (1,2);
postgres=# \q
Open $HOME/pg/data/postgres.conf
and ensure
listen_addresses = 'localhost'
Then, append the following to the end of $HOME/pg/data/pg_hba.conf
host replication rep_user localhost md5
Run the following command to create a backup of the primary server which will then be used as the data directory for our replica server (located at $HOME/pg/rep
).
$ pg_basebackup -h localhost -U rep_user -X stream -C -S replica_1 -v -R -W -D $HOME/pg/rep
After this directory has been created, open $HOME/pg/rep/postgres.conf
and set the port number to something other than the PostgreSQL default 5432.
port = 5433
While still in postgres.conf
edit the line primary_conninfo
to appear something like the following:
primary_conninfo = 'dbname=postgres user=postgres host=localgost port=5432 sslmode=disable'
The last thing we must do before starting the servers is to create a dummy standby.signal
file in the secondary server’s data folder. Enter the following in the command line:
$ touch $HOME/pg/rep/standby.signal
Open up a second terminal as we are going to start two separate instances of PostgreSQL on our host machine, each on different ports.
In the first terminal, our primary server, enter:
pg_ctl -D $HOME/pg/data start
In the second terminal, our replica server, enter:
pg_ctl -D $HOME/pg/rep start
Each should give an output stating that the server is started. We can confirm this by typing:
$ ps -aux | grep postgres
tristen 29088 0.0 0.1 175596 18580 ? Ss 15:42 0:00 /home/tristen/disk/pgapp/bin/postgres -D /home/tristen/pg/data
tristen 29089 0.0 0.0 175728 2388 ? Ss 15:42 0:00 postgres: checkpointer
tristen 29090 0.0 0.0 175752 2364 ? Ss 15:42 0:00 postgres: background writer
tristen 29092 0.0 0.0 175596 7424 ? Ss 15:42 0:00 postgres: walwriter
tristen 29093 0.0 0.0 177196 4892 ? Ss 15:42 0:00 postgres: autovacuum launcher
tristen 29094 0.0 0.0 177176 4984 ? Ss 15:42 0:00 postgres: logical replication launcher
tristen 29112 0.0 0.1 175596 18456 ? Ss 15:42 0:00 /home/tristen/disk/pgapp/bin/postgres -D /home/tristen/pg/rep
tristen 29113 0.0 0.0 175728 2564 ? Ss 15:42 0:00 postgres: checkpointer
tristen 29114 0.0 0.0 175596 2564 ? Ss 15:42 0:00 postgres: background writer
tristen 29115 0.0 0.0 176348 4488 ? Ss 15:42 0:00 postgres: startup recovering 000000010000000000000014
tristen 29116 0.0 0.0 176136 3664 ? Ss 15:42 0:00 postgres: walreceiver streaming 0/14000110
tristen 29117 0.0 0.0 177336 7060 ? Ss 15:42 0:00 postgres: walsender rep_user 127.0.0.1(39650) streaming 0/14000110
tristen 29126 0.0 0.0 17580 2252 pts/2 S+ 15:42 0:00 grep --color=auto postgres
We can see we have two processes of $HOME/disk/pgapp/bin/postgres
running with the two directories.
Now, we will connect to each of the two instances. First the primary server on our first terminal:
psql -U postgres -p 5432
Then the replica server on our second terminal:
psql -U postgres -p 5433
The -p
flag is used to specify the port. Using this on the primary server is technically redundant since it uses the default port. However, this is good to show that we are indeed connecting to two different ports and thus two different instances of PostgreSQL.
If we enter \d
into the Postgres shell, we can get a list of tables that are in the database. Doing so in the second terminal corresponding to our replica server we can see that our table t1
was indeed copied over.
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | t1 | table | postgres
(1 row)
Let’s create a new table on the primary server to show that it will be copied over automatically to the replica.
On our first terminal enter:
postgres=# create table t2(c int, d text);
CREATE TABLE
postgres=# insert into t2 values (3, 'hello');
INSERT 0 1
In the second terminal enter:
postgres=# select * from t2;
c | d
---+-------
3 | hello
(1 row)
We can see that not only was the new table copied over from the primary server but so was the data we just inserted.
It is important to note that this relationship is only one way. That is, the replica is specifically a copy of the primary server. If we tried to create a table in the replica server we would get an error stating:
ERROR: cannot execute CREATE TABLE in a read-only transaction
This follows logically as we want the replica to be read-only to make the data from the primary server more readily available.
Physical Replication With Shipping
Shipping has a similar setup to streaming, we mostly just change some config files. That being said, on the primary server edit postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /path/to/archive/%f'
Restart the server if it is running and create a backup of the primary server:
$ pg_basebackup -h localhost -U rep_user -X fetch -v -R -W -D $HOME/pg/rep
Open $HOME/pg/rep/postgresql.conf
and edit two lines. Comment out the line starting with archive_command
and uncomment the line starting with restore_command
adding the command to use when restoring after the equality operator. The lines should look like the ones below:
#archive_command = 'cp %p /path/to/archive/%f'
restore_command = 'cp /path/to/archive/%f %p'
Startup both servers like in the streaming example and you should see the replica copying the primary. Note that shipping changes may not appear immediately as they do in streaming. Streaming has higher latency and only sends changes when WAL files are filled to the replicas.
Logical Replication
Logical replication like physical has the same beginning setup with the servers. It involves some different configs but also some manual edits for tables to work properly.
On the primary server edit postgresql.conf
wal_level = logical
Make sure that pga_hba.conf
on the primary server contains the correct connection permissions for the replication user.
host all rep_user localhost md5
Once all the configurations are ready, we want to start both servers and login to the interactive shell. Recall that each server is using a different port number in our example, so make sure you aren’t using the same server twice! In each server’s respective shell, we want to create a table in each with the same name and schema. If the tables are at all different, the replica will not be able to find it and we will get an error. That being said, enter the following in both terminals:
postgres=# create table t1(id int, val text);
Now that we have the same empty table in both databases, we can move on to replicating the table. On the primary server enter the following command to create a publication for table t1
:
postgres=# CREATE PUBLICATION pub_t1 FOR TABLE t1;
This creates a publication
for table t1
which means that the table t1 will be replicated to the secondary server. We can have as many tables as we want to replicate as part of the publication.
Once the publication is created, we can subscribe table t1
in the secondary database to table t1
in the primary database. To do this we must create a subscription in the secondary database as follows:
postgres=# CREATE SUBSCRIPTION sub_t1 CONNECTION 'dbname=postgres host=localhost port=5432 user=postgres' PUBLICATION pub_t1;
Note, that for this example we used both the default user postgres
and the default database postgres
, this is not necessary.
Now that we are subscribed we will see all updates in table t1 in the primary server appear in table t1 of the secondary server. Any other table on the primary server not part of the publication we made will not be copied over to the replica. Let’s see an example of that now, in the primary database terminal insert some data into table t1
:
postgres=# insert into t1 values (1, 'hello');
INSERT 0 1
postgres=# insert into t1 values (2, 'world');
INSERT 0 1
Now let’s see if the table was actually replicated in the secondary server:
postgres=# select * from t1;
id | val
----+-------
1 | hello
2 | world
(2 rows)
And there we have it, the data was successfully copied over to the replica server. But how do we know this is actually doing anything different than Physical replication? Well lets add a new table to our primary database and insert some data:
postgres=# create table t2(c int, d int);
CREATE TABLE
postgres=# insert into t2 values(3,4);
INSERT 0 1
postgres=# insert into t2 values(5,6);
INSERT 0 1
Let’s check the replica:
postgres=# select * from t2;
ERROR: relation "t2" does not exist
LINE 1: select * from t2;
We can see that table t2
was not copied over since it does not have a subscription and therefore does not exist in our replica. This allows us to expose certain tables from a central database to remote ones while keeping other tables private.
But how do alterations to the table translate in a replica? Well let’s see. In the primary database terminal lets alter the table and add some data:
postgres=# alter table t1 add column x int;
ALTER TABLE
postgres=# insert into t1 values (3, 'foo', 42);
INSERT 0 1
postgres=# insert into t1 values (4, 'bar', 10);
INSERT 0 1
postgres=# select * from t1;
id | val | x
----+-------+----
1 | hello |
2 | world |
3 | foo | 42
4 | bar | 10
(4 rows)
Now checking the secondary server:
postgres=# select * from t1;
id | val
----+-------
1 | hello
2 | world
(2 rows)
We can see that our secondary server is no longer replicating the primary. Even if we add new data in the same schema to the primary it wont show up in the replica:
postgres=# insert into t1 values (15, 'test');
INSERT 0 1
postgres=# select * from t1;
id | val
----+-------
1 | hello
2 | world
(2 rows)
Since we changed the schema of t1 our replica no longer copies data from it since it is now effectively a different table. This is important to remember if you ever change a table’s schema and have replicas running somewhere else.
What happens if we don’t create a table in our replica that matches one in our primary server?
postgres=# CREATE PUBLICATION pub_t2 FOR TABLE t2;
CREATE PUBLICATION
postgres=# CREATE SUBSCRIPTION sub_t2 CONNECTION 'dbname=postgres host=localhost port=5432 user=postgres' PUBLICATION pub_t2;
ERROR: relation "public.t2" does not exist
We get an error stating that the relation we want to subscribe to doesn’t exist. This can be very confusing as we gave it all the connection information and we know table t2
exists at that location. However, this error is referring to the fact it cannot find table t2
on our replica and therefore cannot create a subscription. So it’s important to remember, we need both databases to contain the same table with the same schema for logical replication to work properly.
Summary
Below is a summary of all the topics we discussed above from a high-level view. If you are still trying to decide what replication method you want to use then this summary could be helpful in making a decision.
Logical vs Physical Replication
Logical | Physical |
Works at the row level, copies changes to individual database rows from the primary to the replica servers. With logical replication, we can choose what tables, schemas or even columns to replicate giving us more granularity. | Works at the disk block level, copying data from the primary server to the replicas. This allows it to be much more efficient with both time and space. |
Because logical works at the row level we can make backups between different PostgreSQL versions and even between PostgreSQL instances running on different operating systems. | We don’t get the granularity of logical replication since the whole database cluster is copied. |
Logical replication supports bi-directional or multi-master replication setups. | We also need the same PostgreSQL version running on the same operating system to make replicas. |
Logical replication is less efficient than physical replication | Physical replication has two versions, shipping and streaming |
Shipping vs Streaming Replication
Streaming | Shipping |
Sends changes from the primary server to the replicas as they occur. This leads to lower lag and ensures replica is up to date with the primary server | Near real-time, doesn’t send changes as they occur but instead sends WAL files after they are filled or a configurable timeout is reached. Leads to higher latency. |
Continuously sends WAL records to the replicas which replay them to stay in synch | Works at the file level, WAL files are archived on the primary server and copied to the replicas. This is done via manual transfer or a script, the WAL files are then replayed to stay in sync with the primary server. |
More efficient and less resource intensive doesn’t require copying or archiving WAL files | Less efficient and more resource intensive |
Requires persistent connection | No persistent connection is required, good for environments with spotty connections or high latency. |
Replicas are read-only. Useful for load balancing queries, high availability and backups but not for writing. | Replicas can be configured to be read-only or read-and-write. |
Asynchronous vs Synchronous Streaming
Asynchronous | Synchronous |
Replicas are allowed to fall behind the primary server if the primary is busier. If the primary crashes we lose the data that was not replicated. | The primary server does not commit until it receives confirmation that a replica has received the transaction. We never lose data if the primary crashes, however, this method can slow down the primary or even stop it if the replica has problems. Also, has a performance impact due to network latency. |
Conclusion
This concludes this article on setting up server replication with PostgreSQL for a variety of replication methods. First, we went over the actual steps to set up and run each method and then viewed a high-level summary of each method’s features.
References
27.2. log-shipping standby servers. PostgreSQL Documentation. (2023, February 9). Retrieved March 29, 2023, from https://www.postgresql.org/docs/current/warm-standby.html
31.11. quick setup. PostgreSQL Documentation. (2023, February 9). Retrieved March 29, 2023, from https://www.postgresql.org/docs/current/logical-replication-quick-setup.html
B, A. (2022, April 8). [web log]. Retrieved March 29, 2023, from https://scalegrid.io/blog/comparing-logical-streaming-replication-postgresql/.
Levinas, M. (2022, October 10). [web log]. Retrieved March 29, 2023, from https://www.cherryservers.com/blog/how-to-set-up-postgresql-database-replication.
Ravoof, S. (2023, February 17). [web log]. Retrieved March 29, 2023, from https://kinsta.com/blog/postgresql-replication/.
Tristen received his Bachelor of Applied Science in Computer Engineering from the University of British Columbia in May 2023. He joined HighGo Software Inc. as a Software Engineer fresh out of university and is very excited for his engineering journey to begin. His main interests include Machine Learning, Embedded Systems, and Database Management Systems. With experience in C/C++ and advanced relational databases, Tristen hopes to contribute significantly to the PostgreSQL community as he continues to learn and grow his expertise.
Recent Comments