PGConf.dev 2024: Shaping the Future of PostgreSQL in Vancouver

Enterprise PostgreSQL Solutions

Comments are off

PGConf.dev 2024: Shaping the Future of PostgreSQL in Vancouver

Introduction

For the first time ever, PGConf.dev (previously known as PGCon from 2007 to 2023) has taken place in the picturesque city of Vancouver, Canada. This rebranded conference brings a fresh perspective and several new additions, elevating the experience beyond its predecessor. While PGCon was traditionally more developer-focused, attracting experienced developers, contributors, and researchers from around the world to tackle “wicked hard” problems, PGConf.dev broadens its scope to emphasize community contributions alongside technical discussions.

With a new conference organization committee, (primarily led by Jonathan Katz and Melanie Plageman) PGConf.dev 2024 strikes a balance between preserving its rich history of in-depth technical sessions and fostering a more inclusive and community-oriented environment. New sessions such as “Intro to Hacking on Postgres”, “Advanced Patch Feedback Session,” and “Making PostgreSQL Hacking More Inclusive” are specifically designed to guide and inspire users and developers to become active contributors to the PostgreSQL community.

Hosted in the stunning backdrop of Vancouver, attendees are treated not only to insightful discussions and learning opportunities but also to the city’s vibrant culture and scenic beauty. From the bustling waterfront and lush parks to its renowned food options. Refer to here for a list of Highgo’s recommended restaurant lists. Vancouver offers a perfect balance between professional enrichments and memorable experiences, making PGConf.dev 2024 an event to remember.

About PGConf.dev

PostgreSQL Development Conference, aka “PGConf.dev”, is a conference where users, developers, and community organizers come together to focus on PostgreSQL development and community growth for the years to come.

As mentioned, PGConf.dev is the successor of PGCon, that focuses on creating a fun and constructive environment for PostgreSQL contributors around the world to meet, collaborate, and learn from each other. While also retaining the technical aspects of the conference as legacy.

Unlike PGCon, PGconf.dev has more emphasis of the community and socialization, to guide developers and users to become potential contributors or even committers. These include the additions of:

  • 10k sea wall run for people to meet while exercising along beautiful Vancouver water front.
  • conference-wide Telegram group chat to send messages to other participants and receive conference announcements in real time.
  • Patch review workshops to sit together with PostgreSQL contributors review and improve your patch and email engagements
  • special session “making PostgreSQL hacking more inclusive” where some of the new PostgreSQL committers were invited to share their journey in becoming a committer.
  • special session “Intro to Hacking on Postgres” to help new developers familiarize how PostgreSQL community works.
  • a special keynote session with an invited speaker.

PGConf.dev was a success, thanks to the hard work of the organization committee:

  • Amit Kapila
  • Dan Langille
  • Jonathan Katz
  • Daniel Gustafsson
  • Magnus Hagander
  • Melanie Plageman
  • Paul Ramsey
  • Steve Singer

More Information Here:

Sponsors

PGConf.dev 2024 was sponsored by:

Gold Sponsors:

Bronze Sponsors:

Conference Highlights

A Warm Welcome

Hosted in Simon Fraser University (SFU) in downtown Vancouver, the conference was officially kicked off by 2 primary organizers, Jonnathan Katz and Melanie Plageman.

Give them a round of applause for a conference well organized.

Yurii Rashkovskii

As the founder of Ominigres who aims to improves the developer experience of current PostgreSQL extension ecosystem, Yurii presented a speech “Pushing Boundaries with Extensions: For Extensions“, expressing that the Postgres extension ecosystem has been growing rapidly, but it also brings some limitations and problems.

Some of the enhancements Yurii is working on includes but not limited to:

Omni manifest – dependency resolver. When a plugin depends on one or more other plugins, Omni manifest will try to download the dependent plugins from known plugin repositories, such as pgxn

Omni migration – handles differences caused by plugin version upgrades (functions, data types, etc.)

atomic switchboard – Omni prevents multiple background workers from starting at the same time

transaction aware – the registration of hooks can be protected by transactions. If rollback occurs, hooks will be automatically uninstalled.

cluster scoping – omni can use different hooks for different databases without restarting. If you want to change hooks in PostgreSQL now, you need to restart and reload the extension.

Omnigres is open source, check out its repository on github

Matthias Van De Meent

Matthias is a senior engineer at Neon. In his speech, “Shaving Off Bytes at any Scale: Space Savings in Various Subsystems of PostgreSQL“, he introduces where storage space can be saved/optimized in the submodules inside PostgreSQL.

Though he did not describe the exact approaches to optimize space, he did point out several places inside PostgreSQL sub systems where there were substantial “wasted memory space” that can potentially be optimized. Places such as:

  • catalogs->pg_attributes: 104 bytes of header information where most of them are duplicate data from other places
  • table storage – visibility information: useless if tuple is marked as “frozen”
  • data padding – wasted space when system adds padding to boolean type when only 1 byte is needed to represent true or false.
  • no page level compression – lots of potential space savings here
  • WAL – empty WAL records occupies some space as well
  • TOAST update – very expensive in terms of space and memory. Can we make a simple API to directly append data?

Thomas Munro

Microsoft’s Thomas is one of the main committers in the PostgreSQL community. In his talk “Streaming I/O and Vectored I/O“, he explained that PostgreSQL reads and writes data files in 8KB size each time as default when it calls “ReadBuffer()” function and that the underlying kernel layer cannot be optimized beyond this limit.

He is exploring a new “streaming” programming model, which relies on callback functions to indicate the next required blocks in advance, and then extract the buffer from the “stream”. Wit this approach, the streaming infrastructure can foresee future needs as much as possible, resulting in a significant increase in buffer I/O.

This early work is currently committed with commitfest entry link here. Feel free to check it out.

Robert Haas

Robert, a PostgreSQL core member from EDB, explained how autovacuum can go wrong and how to improve it in his talk “How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing That?“.

Robert talked about some major errors of autovacuum, including slow vacuum speed, vacuum stuck, repeated vacuum or automatic vacuum skipping, and discussed the feasibility of improving autovacuum to avoid these problems.

These are important metrics and information for database administrators when autovacuum starts to have negative impacts on database operations.

Some of the future improvement suggestions from Robert are:

  • automatically tune the cost limit and worker counts based on data backlog size.
  • prioritization.
  • avoid useless vacuuming when something is holding back xmin.
  • rethink scale factor mechanism for large tables.
  • goal-based vacuum rather than trigger-based

Alena Rybakina

How AQO integrates with PostgreSQL planner module:

How AQO feeds learning results back to the Path Generator:

Alena from Postgres Professional, in her talk “Adaptive Query Optimization in PostgreSQL” explained the mechanism of Adaptive Query Optimization (AQO) that makes PostgreSQL’s planner module much more smarter as it is able to “learn” from user’s query patterns and “adjust” the plan to speed up query execution performance.

AQO has a built-in “planner memory”, which is populated by analyzing the query execution tree and saving the actual cardinality. If the problem situation is repeated again (the same query or another query with the same query subtree), the planner will use this knowledge, which may give it a chance to build the right plan, thus speeding up the query execution.

AQO dynamically changes the execution plan according to the learning status to achieve the best execution results.

Currently limited to single-process execution plans, if the planner selects multiple processes to execute a statement, AQO needs to spend more time to learn, and the effect is not as good as a single process.

AQO is open source and can be found here.

Masahiko Sawada

Masahiko from AWS, in his talk “PostgreSQL Meets ART – Using Adaptive Radix Tree to Speed Up Vacuuming“, explained the benefits of Adaptive Radix Trees (ART), originally published in the paper “Adaptive Radix Trees: ARTful Indexes for Main-Memory Databases” by Viktor Leis, Alfons Kemper, and Thomas Neumann.

ART is currently committed in PostgreSQL 17 development branch that has a significant boost in vacuum performance. Instead of using a array to store TID values during vacuum, it now uses a “TID Store” data structure based on ART, which has the following benefits:

  • occupies less space and memory.
  • removes the 1GB array size limit.
  • makes TID lookup faster, thus faster vacuum.

This is a new feature for PostgreSQL 17 and Masahiko mentioned that the ART testing is not yet mature and urged the community to help him test it some more and improve it where possible!

Bruce Momjian

Bruce, in his talk “PostgreSQL and the Artificial Intelligence Landscape” introduced the ideas of:

  • Artificial Intelligence
  • Machine learning
  • Deep Learning

While it is possible to do all of the above outside of a database, he explained the advantages of doing them inside the database close to where the data is stored.

Bruce demonstrated how to select attributes, create neurons, create training data and provide feedback to machine learning by adjusting weights using the plperl-based extension in PostgreSQL.

Performing AI-related tasks require a lot of data and computation resources and GPUs are designed to efficiently perform simultaneous repetitive computations. It is possible to harvest the power of GPU using pg-storm extension which adds GPU-acceleration to PostgreSQL.

Some of the AI-driven general use-cases in database include but are not limited to:

  • natural language queries
  • retrieval-augmented generation (RAG)
  • optimize plan
  • index creation and destruction
  • database tuning
  • resource allocation and exhaustion
  • malicious activity alert
  • … many more

Hayato Kuroda

Hayato from Fujisu, in his talk “Online Upgrade of Replication Clusters Without Downtime“, introduced several new additions in PostgreSQL 17 related to logical replication and how it can be utilized to minimize down-times when upgrading replication clusters.

Traditionally users rely on pg_dump and pg_restore, but these are slow. Some rely on pg_upgrade, but it requires the nodes to be shut down and requires the user to manually re-setup the replication clusters after the upgrade. Hayato emphasized that using “logical replication” is the way to go.

The new tool pg_createsubscriber allows subscriber nodes to reuse data that has already been physically replicated, so it focuses only on incremental logical changes. This eliminates the need to synchronize data (COPY) before logical replication begins.

The pg_createsubscriber tool can basically convert a physical replication standby node into a logical replication subscriber without recopying the data. So it has a great advantage when the amount of data is large.

in PostgreSQL 16 and previous versions, pg_upgrade does not synchronize logical replication slots information. So, It needs to be manually copied and configured before the logical replication can continue. In PostgreSQL 17, the logical replication slot will be synchronized after pg_upgrade, so the logical replication can resume automatically and natively.

The pg_createsubscriber tool and the synchronization of the logical replication slot eliminate most of the downtime caused by the upgrade. These features are available in PostgreSQL 17.

Heikki Linnakangas

Heikki, a PostgreSQL committer from Neon, presented a talk “Multi-threaded PostgreSQL” that is considered controversial by some of the community members and started a lot of community discussions.

Heikki introduced:

  • Advantages and disadvantages over multi-process.
  • How to use multiple threads correctly
  • some of his previous attempts, and the challenges of switching PostgreSQL to multiple threads.

Some of the long term benefits include:

  • cheaper connection – “built-in connection pool”
  • shared relcache and plancache.
  • ability to resize shared memory areas that are currently fixed sized.
  • change settings without rerstart.
  • track snapshots that are in use to vacuum more aggresively.
  • limit memory usage per session.

Some objections include:

  • not worth the effort.
  • too much incompatibility.
  • extension framework is a problem.
  • worse memory leaks.
  • introduce a lot of bugs, mostly silent.
  • multi-process gives better isolation.

This is quite a controversial discussion topic and it would take a lot of work to get going. It is generally agreed that it can start with a hybrid approach. To have threads to serve a new connection instead of a process, add a flag for extensions to indicate if they are thread-safe. This topic will be on going for some time.

Tatsuo Ishii

RPR Query Example:

SELECT *
FROM table_name
MATCH_RECOGNIZE (
    PARTITION BY partition_column
    ORDER BY order_column
    MEASURES measure_definitions
    ONE ROW PER MATCH | ALL ROWS PER MATCH
    AFTER MATCH SKIP TO NEXT ROW | PAST LAST ROW | FIRST ROW
    PATTERN (pattern)
    DEFINE variable AS condition
);

Tatsuo from SRA OSS shared his current work in supporting row pattern recognition feature in PostgreSQL. The initial version of this feature is targeted for release in PostgreSQL 18.

Row Pattern Recognition (RPR) has been defined since SQL:2016. RPR is a powerful feature for analyzing complex data, especially a series of data. With RPR, you can find specific trends in a series of rows by defining search conditions and regular expressions.

The main purpose is to use “regular expressions” to identify specific sequences or conditions of events from data, such as detecting stock price patterns, anomaly detection, etc.

Some supported RPR clauses include:

  • START
  • UP+
  • DOWN+
  • INITIAL
  • DEFINE
  • PATTERN
  • WINDOW
  • FIRST, LAST, PREV, NEXT
  • MATCH_RECOGNIZE
  • … etc

More will be supported in the future development work. Some of the not supported clauses / features include:

  • SEEK
  • AGGREGATES
  • Subqueries
  • CLASSIFIER
  • … etc

RPR related tests can be found in src/test/regress/sql/rpr.sql.

David E. Wheeler

David from Tembo presented a talk “The Future of the Extension Ecosystem“, which outlined his vision of what PostgreSQL extension ecosystem would look like in the future. He shared the current work-in-progress of his work on PGXN 2.0, which is designed to enhance extension developer experience.

PGXN may be the first Postgres extension registry, but it is definitely not the last! In the past year, new members including database.dev, trunk, etc. have joined, which together show that people are increasingly interested in extensions and eager to expand and improve discovery, packaging and distribution.

PGXN 2.0 aims to enhance developer experience by providing convenient routines to generate documentation, source code, test scripts, github workflow, meta data, CI scripts, package scripts..etc that can greatly enhance the quality of extension development.

Some of the PGXN 2.0 developer CLI commands:

# create exension
pgxn develop new --extension --langauge c --vcs github myproject

# upgrade extension
pgxn develop add-upgrade 0.0.2

# other commands
pgxn package build
pgxn package ls
pgxn package install
pgxn upgrade
pgxn delete

Jonathan Katz

Jonathan from AWS is a core member of PostgreSQL and also one of the main organizers of this conference. In his talk “How to Better Support a Nasty Data Type in PostgreSQL“, Jonathan explained the importance and challenges of vectors for efficient storage and indexing of databases.

At the same time, the rapid development of artificial intelligence in the past years has also led to the rapid iteration and release of pg_vector. Jonathan talked about the rapid expansion of database storage and indexing when the vector dimension increases, as well as the challenge of finding neighbor nodes through multiple dimensions.

pgvector introduced 2 indexing methods:

  • IVFFlat – K-means based
  • HNSW – graph based

both of which can we used for machine learning models to accommodate the rapidly growing demands for data processing.

Keynote Speaker – Margo Seltzer

Margo, a professor from University of British Columbia (UBC) gave a wonderful keynote speech, “When Hardware and Databases Collide” that explained the methods of utilizing GPU to accelerate database performance.

Margo also explored the methodology to expand the reach of in-memory database across different machines, achieving potential endless memory access via the use of Compute Express Link (CXL).

CXL is an open standard for high-speed, low-latency interconnect designed to connect processors, memory, accelerators, and other peripherals. It builds on the existing PCIe (Peripheral Component Interconnect Express) infrastructure, extending its capabilities to provide a more coherent and efficient way to share memory and resources across different components in a data center.

In short summary, to bring database system like PostgreSQL to the next level, we must figure out a way to harvest the computing power from one or more interconnected GPUs and with CXL, it is possible to access memory far beyond the maximum limit a single machine can handle.

Special Session – Making PostgreSQL Hacking More Inclusive

This is one of the special sessions designed to share the journey of 3 new PostgreSQL committers and discuss how they works they have done to become a comitter.

This session is led by Robert Hass and 3 new committers take turns to share their journeys to becoming a committers. The 3 new committers are:

  • Masahiko Sawada
  • Amit Langote
  • Melanie Plageman

Some challenges in interacting with the community:

  • English as a second language
  • time zone differences.
  • community is not perfect. Emotions often get in the email communications after a long day.

Amit shared that he had problems with email communication due to English. Use direct communication rather than indirect.

Masahiko also admitted that English is a problem with email communication, so he had to take extra time into drafting a email. Overall, when you have a point, you have got to describe it directly, rather than indirectly. Try to understand why your email is not getting any replies.

Melanie also stated the importance of being direct in your email communication. Sometimes, it is not about the technical interest of a topic, but rather, it is about how you express a technical topic. Do not talk around people, acknowledge the positive things people do so it is more pleasant to communicate with you.

Unconference

Unconference is a tradition in PGCon where people gather together and brainstorm a list of interesting topics that shall be carried on into development for the year to come. Each topic will be led by one or more leaders sharing a similar topic. The discussion includes technical challenges of the particular topic, define the TODOs next and decide who shall lead the development work. Below is the highlights of some of the unconferences sessions we have attended.

Improve Extension In Core

This unconference mainly discusses the possible optimization and improvement options of the current plugin architecture and explores the headaches of plugins:

Problems with API stability:

  • Extension may be broken after a minor version upgrade (for example from 15.1 to 15.2), if there are some changes in data structures of an API call.
  • Try not to change the API for minor version updataes
  • Is there any tool to detect plugin API damage? Yurii from Omnigres said that it may be possible to include minor version information in the MAGIC block of the plugin. Omnigres has such feature

Lack of Extension documentation:

  • Add a new extension management backend module?
  • Some people suggest adding a plugin manager to the PG backend to manage plugin versions and compatibility. This ensures the order of reading plugins, resolve dependencies, etc. Similar to what Omni is doing, but in the PostgreSQL core.
  • pgxn can add more testing functions on top of this plugin manager.
  • Display all plugin information, version numbers, new SQL functions provided, etc.

If you want to participate in future plugin optimization discussions, you can go to github.com/pgedc to participate in the discussion. EDC is the abbreviation of Extension Development Coalition.

Improve Observability

This unconference mainly discussed whether it is possible to output the statistics of pg_exporter directly into the PostgreSQL kernel, making the integration of PostgreSQL statistics more convenient and efficient.

Observability is very important because it is not possible to debug directly on GDB in the production environment. Problems can only be found through logs or statistics

Statistics include:

  • Stack depth
  • lock requests
  • usage, time
  • statement processing frequency
  • load rate
  • autovacuum information, etc.

The statistics module can also achieve a certain degree of self-repair, and can be better integrated with open-telemetry and Prometheus

pg_exporter is good, but it is a plug-in after all, with more overhead, and it cannot obtain some system-level statistics. If PostgreSQL crashes, pg_exporter cannot know these serious events. Only by making this function part of the core can we truly and completely intercept this information.

Improve Testing Framework

This unconference mainly discussed whether the current perl-based test framework can be upgraded to python’s pytest.

pytest can test more complex tests and is more powerful than the current tap test architecture.

Jacob Champion, who is working on security functions, said that he can use pytest to:

  • dynamically generate certificates
  • do complex tests
  • malicious packages on the connection to test the system’s response
  • output pcap capture files, which is very helpful for security developers.

But using python will have the dependency problem of python libraries, which requires a dependency manager to manage, or a detailed description of python requirements

Other people’s optimization of tests include:

  • Can we not let the regression test case depend on the previous test, so that each test is independent?
  • Can we only re-run the failed regression test? Instead of all?
  • The documentation of tap test is not clear, and it is difficult to add new tests.

These are some of the “wish lists” from the PostgreSQL users, but no solid TODOs have been formulated at the end of this unconference session.

Jacob has already done some framework integration work based on pytest. Check it out here

Future of Build Farm

This unconference mainly discussed whether the current CI architecture (github cirrus CI) can be made better.

Some people suggested using pull request to submit patches, and then let CI directly soak in a pull request, and merge if the test passes.

But some people also said that you can fork a PostgreSQL yourself, then set up cirrusCI yourself, and then run the test yourself, and then send the patch after it passes. cirrusCI runs on the public repository for free.

Each has its pros and cons, and everyone is putting forward their own opinions, but everyone still votes for the existing CI architecture without pull requests.

A strong emphasize is made in this session that a developer must use CirrusCI in his / her own repository to ensure the patch passes the tests before they can be submitted.

Hallway Talks

While people are busy attending each technical talk, lots of discussions also happen outside of it. Here’s some images highlights of the social events and hallway talks.

Social Events – Dining – Seawall Run

Vancouver is known for natural beauty and rich assortment of food selection. Here are some community photos from the conference and also yummy food photos shared by the attendees.

End of Conference

The conference ends with Jonathan and Melanie giving a conclusive speech, where they thank everyone for attending, all the staffs and volunteers’ hard work into making the conference possible, and also the generous support from the sponsors.

All the talks and slides can be found on the pgconf.dev 2024 website here . Since this is the first time pgconf.dev is hosted by a new organization committee, it is important to give your feedback about the conference.

Anything you like in particular? or if there is anything that could be done better? express your thoughts in the feedback page here.

Jonathan announced that pgconf.dev 2025 will also be hosted in Canada (spoilers), and it is going to be in one of these cities:

  • Vancouver
  • Toronto
  • Ottawa
  • Montreal

You can follow pgconf.dev on X, Mastodon, or Linkedin to vote for which city you would like pgconf.dev to be hosted at next year.

Conference recordings will be made available here when ready