There are several solutions out there that can solve distributed database issues (such as Citus) and solutions out there that can solve high availability and database clustering issues (such as patroni). Yes, they do solve distributed and database cluster issues but at the same time make database maintenance and debugging more complicated. Very few developers are willing to do this kind of stuff as they should focus more on application development. As things go cluster or distributed, the cost increases too, to procure more hardware, or to buy more cloud servers.
This is why serverless database is increasing in popularity these days, which provide a fully managed database service that no longer requires application developers to worry about how to do clustering and sharding on their database, nor have to worry about regular database maintenance. Most importantly, it could help them reduce operation costs by dynamically scaling up and down depending on load.
Recently, I had an opportunity to do research on serverless database and evaluate several database vendors’ solutions. (Neon serverless, and cockroachdb to be exact). In this blog, I will do a quick summary of the common problems people face in the world of clustering and high availability and how they solve these problems. To be serverless, these problems need to be solved first.
2.0 Horizontal Scaling on Read and Write
PG has streaming replication that consists of one primary node (can read + write) and several standby nodes (can only read), so with PG, we can achieve horizontal scaling on read operations, normally with a HA proxy in the front separating read loads among all standby nodes. Neon is built based on PG and shares very similar architecture. It also has a single write node + multiple read nodes, so it, like PG, cannot support horizontal scaling for write.
Cockroachdb on the other hand can support horizontal scaling on read and write. Entire database is split into multiple
ranges and each node is responsible for
write operations on its designated range. This node is also referred as
leaseholder according to the documentation. If a node receives a read or write request on a data that is not within the
range of current node, it will forward it to the leaseholder responsible for the data. In other words, every node can receive read or write request and on the backend ,they will figure out the designated leaseholder and forward the request there.
3.0 Data Visibility Consistency
Based on PG’s streaming replication, when primary makes changes to database, it will send the change via WAL segments to all standbys to REDO. If for some reason, a standby has not REDO-ed or catched up to primary’s current LSN while receiving a read request, it may return the old value before primary changes it. Neon made some enhancements on this issue; it separate compute and storage nodes. When primary makes changes, it sends the to safekeeper nodes, which in turn, shares it to page server node instead of directly to standby nodes. When standby receives a read request, it will figure out the LSN related to the target data page and request page server to return it. If page server has not received any WAL related to the given LSN, it makes the standby wait until it receives the LSN. This
wait on LSN gurantees the consistency.
Cockroachdb, on the other hand, uses different approach. Read and write requests are internally forwarded to the leaseholder node to process via
raft protocol. Normally, there should not be a consistency issue because both reads and writes are handled by a single node. If a write has not committed yet while a read is received, this node still has to
wait for the write to complete before it can
4.0 Data Storage
PG internally uses
buffer tag as a look-up key for each data page and in streaming replication, each node has its own copy of data storage. In Neon, it uses
block number and
LSN to be the look-up key for data pages. So every page requests sent to the
page server has a LSN associated with it, the page server uses this LSN value to derive a data page by finding the last known page and REDO WAL logs on it up to the given LSN. Neon uses a shared storage approach where the data is ultimately stored on cloud storage server and each node access the storage via page server service. This is a major change done by Neon to separate storage and compute nodes
Cockroachdb uses a key-value based storage called
pebble, and key is simply a number that represents a data page. Each node maintains its own storage and every updates have to go through the raft protocol to make other nodes agree on the change. The
raft protocol itself is also used to replicate database changes to all nodes
This is only a very high level summary of how Neon and Cockroachdb handles clustering and high availability issues before they can be deployed serverless. There is a lot of details to what’s been described here. Feel free to visit their documentation page (neon and cockroachdb) to learn more.
Cary is a Senior Software Developer in HighGo Software Canada with 8 years of industrial experience developing innovative software solutions in C/C++ in the field of smart grid & metering prior to joining HighGo. He holds a bachelor degree in Electrical Engineering from University of British Columnbia (UBC) in Vancouver in 2012 and has extensive hands-on experience in technologies such as: Advanced Networking, Network & Data security, Smart Metering Innovations, deployment management with Docker, Software Engineering Lifecycle, scalability, authentication, cryptography, PostgreSQL & non-relational database, web services, firewalls, embedded systems, RTOS, ARM, PKI, Cisco equipment, functional and Architecture Design.