Introduction
In this blog, we’ll go over the different methods we can use to encrypt and decrypt data in a PostgreSQL database. Having some experience with Linux and PostgreSQL is necessary, while experience with encryption is not but is nice to have. This blog was written using PostgreSQL 16 running on Ubuntu 23.04. First I’ll go over some motivations for encryption and its importance to data security, then look at a subset of the functions offered by PostgreSQL to implement encryption.
Background
While we rarely interact with encryption in our day-to-day lives, it is fundamental for the security of our sensitive information such as banking, healthcare and, much much more. Typically, encryption is abstracted away from the end user, obscuring all the complex math and algorithms, allowing us to simply enter our password and have everything work. However, there are tradeoffs to these abstractions as we have to trust the person who decrypts our data to not make a copy of it or any other malicious behaviour. In this blog, we will go over the different methods PostgreSQL uses to encrypt data and the tradeoffs we encounter when using them.
Encryption Options
The Postgres documentation lists 6 levels of encryption supported by the database software. These are:
- Password Encryption
- Probably the simplest and most commonly used form of encryption. Before being sent to the server, the Postgres client will hash the user password before storing it inside the database. This means the plaintext password is never stored on the server making it quite difficult for a would-be attacker to get it.
- Encryption For Specific Columns
- The pgcrypto module supplies cryptographic functions for encrypting data stored in specific columns of the database. To decrypt the data, the client must send over the key, and the data is unencrypted on the server side. This means that both the key and data are briefly exposed on the server’s end to anyone with elevated permissions (like a database admin).
- Data Partition Encryption
- This method does not pertain to PostgreSQL specifically but rather refers to the underlying operating system using encryption when writing data to disk. This means the Postgres Server has drive-level encryption to stop anyone from reading data should they get physical access to the server.
- Encrypting Data Across A Network
- This refers to the different methods that PostgreSQL can be configured with to securely transfer data across the network. Both SSL and GSSAPI can be configured in the pg_hba.conf file by specifying a host and its encryption. SSH is also a commonly used protocol to connect over a network that PostgreSQL supports.
- SSL Host Authentication
- For this level, both the client and server must be setup to exchange SSL certificates in an SSL/TLS handshake. Once working, this method prevents a possible attacker from pretending to be the server and gaining access to restricted information, also known as a Man-in-the-middle attack.
- Client-Side Encryption
- The final, and possibly most secure option is for a client to encrypt the data themselves before ever sending it to the server. This means the client must manage all encryption and decryption on their end but also removes the possibility of a malicious administrator having the ability to read your data.
Encryption Functions
All of these functions and more are documented in the PostgreSQL documentation on the pgcrypto module.
General Hashing
The Digest function:
digest(data text, type text) returns bytea
This function transforms the data stored in the variable ‘data’ into a binary hash represented as a byte array based on the encryption method specified in type.
data: The input data we want to compute the binray hash on
type: The hashing algorithm to use (supported: md5, sha1, sha224, sha256, sha384 and sha512)
return: The resulting hash as a byte array
Raw Encryption
These functions simply run a cipher over the inputted data and do not provide any sort of management of keys or initialization vectors (IV). All of this management is expected for the user to handle and is generally discouraged for real-world use.
encrypt(data bytea, key bytea, type text) returns bytea
decrypt(data bytea, key bytea, type text) returns bytea
As we can see both the encrypt and decrypt functions take the same arguments and return the same type. The only difference is that ‘data’ is a plaintext byte array in encrypt() while in decrypt() ‘data’ is a byte array of the encrypted data.
data: A byte array representing the plaintext (encrypt) or the encrypted data (decrypt) to be transformed
key: The key to use in encrypting or decrypting the data
type: The encryption algorithm, mode and, padding to use. This ‘type’ is different than that in general hashing as we have more options to define. These options are of the form algorithm – mode/pad:padding. For example, a decrypt function using AES-CBC encryption with no padding would look like encrypt(data, 'mykey', 'aes-cbc/pad:none')
.
Raw encryption also has support for IVs in the form of two more functions:
encrypt_iv(data bytea, key bytea, iv bytea, type text) returns bytea
decrypt_iv(data bytea, key bytea, iv bytea, type text) returns bytea
These functions are nearly identical to their non-IV counterparts but contain an extra parameter for defining an initialization vector to the algorithm.
Conclusion
In this blog, the many features PostgreSQL offers for encrypting and decrypting data in and around your database. First, we talked about the importance of encryption and the tradeoffs we make when using it. Then we looked at the different levels of encryption that Postgres supports and their functionality within the database. Finally, we looked at a few implementations of encryption that come with Postgres, their use cases and, how to use them. Overall, encryption is critically important to our security online and necessary to understand if you plan on developing anything that requires a network connection. I hope this blog helped your understanding of encryption and how it functions within PostgreSQL and will motivate you to implement these security fundamentals in your next project.
Tristen received his Bachelor of Applied Science in Computer Engineering from the University of British Columbia in May 2023. He joined HighGo Software Inc. as a Software Engineer fresh out of university and is very excited for his engineering journey to begin. His main interests include Machine Learning, Embedded Systems, and Database Management Systems. With experience in C/C++ and advanced relational databases, Tristen hopes to contribute significantly to the PostgreSQL community as he continues to learn and grow his expertise.
Recent Comments