While managing a small team of development resources working on PostgreSQL development, I sometimes get resources in my team that have good development experience but are new to PostgreSQL. I have developed a short set of training instructions in order to get these resources started with PostgreSQL and get them to familiarise themselves with Postgres and its internals. The purpose of this blog is to share these instructions so it can benefit others in a similar situation. The instructions involve going through a lot of documentation, white-papers, online books, it also includes few development exercises that can be helpful in understanding PostgreSQL codebase. I have found these helpful in the initial training for resources that are new to PostgreSQL, putting them in the blog so they are also helpful to others.
Online Learning Resources
For someone who is new to PostgreSQL, the obvious starting point is understanding PostgreSQL functionality. I would recommend the following resources for reading about PostgreSQL functionality.
This is a really good place for Postgres tutorials, the tutorial available on this site vary from basic “Getting started with PostgreSQL” to complex features like common table expressions, partition tables, etc. It also contains tutorials for Postgres client’s tools, programming interfaces, etc.
This presentations and tutorial available at Bruce Momjain site are also very useful, this site contains presentations, online books are other material on all sorts of topics related to PostgreSQL. Whether it is talking about query processing, Postgres internal, horizontal scalability with sharding, security, etc, this site contains a lot of useful training material related to PostgreSQL.
There is of-course community official documentation for every release of PostgreSQL. The difference between each release document is new features added to the release or changes to existing features. You can get documentation for any release using the main docs link below.
The above resources should give you a really good insight into PostgreSQL features and functionality. The next place to start is understanding Postgres internals and how PostgreSQL community goes about feature development.
I have found this online book (link below) really good in understanding PostgreSQL internals :
It is very useful in understanding the components involved in query processing and taking a deep dive into PostgreSQL internals for storage, WAL and memory management.
PostgreSQL has one of the best and active community with some of the best techies, you can subscribe to the community mailing list using the link below.
The mailing list archives are also available at the same link, the most interesting one for development in psql-hackers where most of the development discussion takes place. The PostgreSQL development team lives in psql-hackers mailing list. You can simply click on any of the mailing lists and search for a particular feature like “TDE” and get all the email threads related to TDE.
This is the best place to understand how community development takes place. For new features, you can see the process of starting from submitting a proposal along with POC patches and getting buy-in from other community members. There is a lot to learn from a community mailing list and understanding how the community goes about in doing PostgreSQL development.
Basic development exercises
It is time to indulge in some development exercises to develop a better understanding of the codebase. It is out of scope of this blog to explain how each one is done, I believe it is already well documented in other online resources.
The first think that I would recommend is getting an understanding of the PostgresSQL regression suite, how it is executed and how to add a new test case to the regression test suite. PostgreSQL has a comprehensive set of test cases that live in “src/test/regress/sql”, last I looked at the master branch the number of test cases files was around 197. The test cases are generally divided based on the functionality i.e. limit.sql, json.sql, etc and for data-type boolean.sql, etc. Each test case file contains detailed test cases for testing the desired functionality.
There are other test cases written in the TAP framework for testing tools and utilities that can’t be tested with just the SQL interface. This is also needed when you need to test with multiple server configurations or need to testing replication/failover or testing backup etc. Please see the link below TAP (Test any protocol) based stated as a PERL based testing framework but later extended to allow writing test cases in other languages.
So someone coming new to PostgreSQL needs to understand how to run the regression test suite and how to add new test cases to regression. We can add test cases for partitioning or any other functionality, make it part of regression schedule and run the regression.
The second exercise is to understand how PostgreSQL extensions work and how to add new extensions to PostgreSQL. The extensions are added to PostgreSQL for making it extensible, The contrib/directory shipped with the source code contains several extensions, which are described in PostgreSQL documentation. Other extensions are developed independently, like PostGIS. Even PostgreSQL replication solutions can be developed externally, Slony is a popular replication solution that is developed outside of the core.
The user can add their own extensions to PostgreSQL according to there needs. This is a really useful that shows how to add a user-defined extension to PostgreSQL.
The third exercise is slightly more challenging but it is very useful in understanding the internals of PostgreSQL. The exercise is to basically follow a SELECT query in PostgreSQL internals and see how the query changes as it passes through the internal components of PostgreSQL. This is done using a debugger and compiling PostgreSQL source with enable-debug and watching the state by placing appropriate breakpoints in the code.
Another real blog is written by Cary (HighGo) that shows how this is done…