How to setup Lustre file system and run Postgres on it

Enterprise PostgreSQL Solutions

Comments are off

How to setup Lustre file system and run Postgres on it

1. Overview

Similar to PostgreSQL, Lustre file system is also an open source project which started about 20 years ago. According to Wikipedia, Lustre file system is a type of parallel distributed file system, and is designed for large-scale cluster computing with native Remote Direct Memory Access (RDMA) support. Lustre file systems are scalable and can be part of multiple computer clusters with tens of thousands of client nodes, tens of petabytes (PB) of storage on hundreds of servers, and more than a terabyte per second (TB/s) of aggregate I/O throughput. This blog will explain how to setup a simple Lustre file system on CentOS 7 and run PostgreSQL on it.

2. Lustre file system

To deliver parallel file access and improve I/O performance, Lustre file system separates out metadata services and data services. From high level architecture point of view, Lustre file system contains below basic components:

  • Management Server (MGS), provides configuration information about how the file system is configured, notifies clients about changes in the file system configuration and plays a role in the Lustre recovery process.
  • Metadata Server (MDS), manages the file system namespace and provides metadata services to clients such as filename lookup, directory information, file layouts, and access permissions.
  • Metadata Target (MDT), stores metadata information, and holds the root information of the file system.
  • Object Storage Server (OSS), stores file data objects and makes the file contents available to Lustre clients.
  • Object Storage Target (OST), stores the contents of user files.
  • Lustre Client, mounts the Lustre file system and makes the contents of the namespace visible to the users.
  • Lustre Networking (LNet) – a network protocol used for communication between Lustre clients and servers with native RDMA supported.

If you want to know more details inside Lustre, you can refer to Understanding Lustre Internals.

3. Setup Lustre on CentOS 7

To setup a simple Lustre file system for PostgreSQL, we need to have 4 machines: MGS-MDS-MDT server, OSS-OST server, Lustre client1 and client2 (Postgres Servers). In this blog, I used three CentOS 7 virtual machines with below network settings:

Client1/PG Server:
Client2/PG Server:
3.1. Install Lustre

To avoid dealing with Firewall and SELinux policy issues, I simply disabled them like below,
Set SELINUX=disabled in /etc/selinux/config, and run commands,

systemctl stop firewalld
systemctl disable firewalld

Add Lustre release information to /etc/yum.repos.d/lustre.repo

name=CentOS-$releasever - Lustre

name=CentOS-$releasever - Ldiskfs

name=CentOS-$releasever - Lustre

Then update yum and install the filesystem utilities e2fsprogs to deal with ext4

yum update && yum upgrade -y e2fsprogs

If there is no errors, then install Lustre server and tools with yum install -y lustre-tests

3.2. Setup lnet network

Depends on your network interfaces setup, add the lnet configuration correspondingly. For example, all my 3 CentOS 7 has a network interface enp0s8, therefore, I added the configuration options lnet networks="tcp0(enp0s8)" to /etc/modprobe.d/lnet.conf as my Lustre lnet network configuration.

Then we need to load the lnet driver to the kernel, and start the lnet network by running below commands,

modprobe lustre
lsmod | grep lustre
modprobe lnet
lsmod | grep lnet
lctl network up

You can check if the lnet network is running on your Ethernet interface using command lctl list_nids, and you should see something like below,

You can try to ping other Lustre servers over the lnet network by running command lctl ping If the lnet network is working, then you should see below output,

3.3. Setup MGS/MDS/MDT and OSS/OST servers

To set up the storage for MGS/MDS/MDT server, I added one dedicated virtual disk (/dev/sdb), created one partition (/dev/sdb1) and formatted it to ext4.

fdisk /dev/sdb
mkfs -t ext4 /dev/sdb1

You need to repeat the same process on OSS/OST server to add actual files storage disk.

If everything goes fine, then it is time to mount the disk on Lustre servers. First, we need to mount the disk on MGS/MDS/MDT server by running below command,

mkfs.lustre --reformat --fsname=lustrefs --mgs --mdt --index=0 /dev/sdb1
mkdir /mgsmdt_mount
mount -t lustre /dev/sdb1 /mgsmdt_mount

Second, we mount the disk on OSS/OST server using below commands,

mkfs.lustre --reformat --ost --fsname=lustrefs --mgsnode= --index=0 /dev/sdb1
mkdir /ostoss_mount 
mount -t lustre /dev/sdb1 /ostoss_mount
3.4. Setup Lustre clients

After the Luster server’s setup is done, we can simply mount the lustre file system on client by running below commands,

mkdir /mnt/lustre
mount -t lustre /mnt/lustre

If no error, then you can verify it by creating a text file and entering some information from one client, and check it from another client.

3.5. Setup Postgres on Lustre file system

As there are some many tutorials about how to setup Postgres on CentOS, I will skip this part. Assume you have installed Postgres either from an “official release” or compiled from the source code yourself, then run below tests from client1,

initdb -D /mnt/lustre/pgdata
pg_ctl -D /mnt/lustre/pgdata -l /tmp/logfile start
create table test(a int, b text);
insert into test values(generate_series(1, 1000), 'helloworld');
select count(*) from test;
pg_ctl -D /mnt/lustre/pgdata -l /tmp/logfile stop

Then run below commands from client2,

pg_ctl -D /mnt/lustre/pgdata -l /tmp/logfile start
select count(*) from test;
pg_ctl -D /mnt/lustre/pgdata -l /tmp/logfile stop

From the above simple tests, you can confirm that the table created and records inserted by client1 are stored on remote Lustre file system, and if Postgres server stop on client1, then you can start Postgres server on client2 and query all the records inserted by client1.

4. Summary

In this blog, I explained how to set up a parallel distributed file system – Lustre on a local environment, and verify it with PostgreSQL servers. I hope this blog can help you when you want to evaluate some distributed file systems.