REPACK is a new PostgreSQL 19 feature for physically compacting a table by rewriting it into new storage. Like VACUUM, it deals with the space left behind by dead tuples, but it does so by building a fresh table file instead of mostly cleaning pages in place. Ordinary VACUUM can mark space reusable
inside the table and may truncate some empty pages at the end, but it usually cannot fully return bloat to the operating system. REPACK, like VACUUM FULL, rewrites the table into a compact file and swaps that storage into place. The important difference from VACUUM FULL is that REPACK CONCURRENTLY keeps the table usable for most of the operation by copying a snapshot and replaying concurrent changes before a short final lock-and-swap phase.
REPACK code is interesting because it sits between several difficult subsystems: table rewrites, index rebuilds, relfilenode swaps, logical decoding, background workers, snapshots, and lock management. Reading repack.c is a good way to understand how PostgreSQL can physically rebuild a table while preserving the table’s logical identity.
At a high level, REPACK creates a new physical copy of a table, fills it with live tuples from the old table, rebuilds or swaps indexes, and then swaps the physical storage underneath the original relation OID. The user still sees the same table OID, privileges, dependencies, inheritance relationships, and catalog identity, but the heap file is new and compact.
The file comment at the top of repack.c summarizes the two modes:
- non-concurrent mode: take
AccessExclusiveLock, rewrite the table, swap storage, drop the old storage - concurrent mode: take
ShareUpdateExclusiveLock, copy the table while writes continue, decode concurrent changes from WAL, replay them into the new heap, briefly upgrade toAccessExclusiveLock, apply remaining changes, then swap.
That split drives almost every design choice in the file.
Entry Point
The main SQL entry point is ExecRepack() in repack.c. It parses options like VERBOSE, ANALYZE, and CONCURRENTLY, chooses the lock level, resolves the relation or relation list, and eventually calls cluster_rel().
One slightly confusing detail is historical: this file still uses names like cluster_rel() because REPACK, CLUSTER, and VACUUM FULL share table rewrite machinery. VACUUM FULL calls into this path from vacuum.c, while CLUSTER and REPACK differ mainly in whether an index order is requested and whether concurrent processing is allowed.
The lock level is centralized in RepackLockLevel():
if (concurrent)
return ShareUpdateExclusiveLock;
else
return AccessExclusiveLock;
That is the first major design point. Ordinary rewriting is simple because it excludes concurrent readers/writers that matter. Concurrent rewriting is harder because the old table remains writable while the new copy is being built.
The Core Rewrite
The core work happens in cluster_rel() and rebuild_relation().
cluster_rel() performs the permission checks, relation checks, index checks, security context switching, and progress reporting. It also handles concurrent-mode eligibility. For REPACK CONCURRENTLY, check_concurrent_repack_requirements() enforces several important restrictions:
- system catalogs are rejected
- TOAST relations are rejected
- only permanent relations are allowed
REPLICA IDENTITY NOTHINGis rejected, because WAL does not contain the old tuple- the table must have an identity index, either replica identity index or non-deferrable primary key
That identity index matters later. During concurrent replay, updates and deletes must find the corresponding tuple in the new heap. The code does that by looking up rows through the identity index.
rebuild_relation() then creates the new heap with make_new_heap(), copies data with copy_table_data(), and finishes differently depending on concurrent vs non-concurrent mode.
In non-concurrent mode, the path is straightforward:
- old heap locked AccessExclusive
- create new heap
- copy visible/live data into new heap
- close old/new relcache entries
- finish_heap_swap(…)
- reindex old logical relation
- drop transient relation
The key operation is finish_heap_swap(), which calls swap_relation_files(). PostgreSQL keeps the old logical relation OID but swaps physical identity: relfilenode, tablespace, access method, persistence, TOAST links, and statistics.
The table is not “renamed into place” like renaming a file. PostgreSQL updates catalog metadata so the original relation points to the new storage.
Copying Data
copy_table_data() delegates the actual scan/copy to the table access method:
table_relation_copy_for_cluster(...)
Before that, it decides whether to use:
- index scan, when clustering by an index
- sequential scan plus sort, when that is cheaper for btree clustering
- plain sequential scan, when no ordering is requested
It also computes aggressive vacuum cutoffs. Since the table is being rewritten anyway, this is an opportunity to remove dead tuples and set a newer relfrozenxid / relminmxid.
TOAST handling is subtle. In non-concurrent mode, if both old and new heaps have TOAST tables, the code may use “toast swap by content”. That preserves TOAST pointer validity for cases like system catalogs. In concurrent mode, this is disabled because replayed deletes/updates may need to manipulate TOAST data in the new heap, and the old-TOAST-pointer trick would become unsafe.
The Storage Swap
swap_relation_files() is one of the most important functions in the file. It swaps physical storage while preserving logical identity.
For normal relations, it swaps fields in pg_class:
relfilenodereltablespacerelamrelpersistence- optionally
reltoastrelid - size statistics
- freeze metadata
For mapped relations, it cannot simply update pg_class.relfilenode, because mapped relations use the relmapper. In that case it updates relation mappings instead.
finish_heap_swap() wraps this with the rest of the cleanup:
- report progress phase
SWAP_REL_FILES - call
swap_relation_files() - rebuild indexes if requested
- drop the transient table
- remove temporary relation mappings
- rename TOAST relations if needed
- clear missing attribute metadata for non-catalog tables
This is why REPACK and VACUUM FULL reclaim disk space: the compacted table becomes the real storage, and the old bloated storage is attached to the transient relation and then dropped.
Concurrent Mode
Concurrent repack is the more interesting path.
The problem is simple to state: while PostgreSQL copies the old heap into the new heap, other sessions may insert, update, or delete rows in the old heap. If the final swap ignored those changes, the new heap would be stale.
This implementation solves that with logical decoding.
In rebuild_relation(), concurrent mode does several extra things:
- become a lock group leader, because later the background worker needs to join the group
- start a decoding background worker
- wait for the worker to initialize logical decoding, see the later section “Does REPACK Require
wal_level = logical?” for more details. - get an initial historic snapshot
- copy the old heap using that snapshot
- build new indexes on the new heap
- decode and apply concurrent changes
- take
AccessExclusiveLock - decode and apply final changes
- swap heap and index storage
The worker is started by start_repack_decoding_worker(). Shared state lives in DecodingWorkerShared, defined in repack_internal.h. The backend and worker coordinate through:
- dynamic shared memory
SharedFileSet- condition variables
- spinlock-protected shared fields
- shared message queue for worker errors/notices
The worker writes decoded changes into files. The first exported file contains the snapshot. Later files contain changes.
process_concurrent_changes() asks the worker to decode up to a specific LSN. It sets shared->lsn_upto, waits until the worker exports the expected file number, opens that file, and calls apply_concurrent_changes().
Applying Concurrent Changes
The replay side is deliberately low-level.
apply_concurrent_changes() reads a stream of change records. The change kinds are defined in repack_internal.h:
#define CHANGE_INSERT 'i' #define CHANGE_UPDATE_OLD 'u' #define CHANGE_UPDATE_NEW 'U' #define CHANGE_DELETE 'd'
For inserts, it inserts the decoded tuple into the new heap and updates indexes:
table_tuple_insert(..., TABLE_INSERT_NO_LOGICAL, ...) ExecInsertIndexTuples(...)
For deletes, it finds the matching tuple in the new heap using the identity index and deletes it:
find_target_tuple(...) table_tuple_delete(..., TABLE_DELETE_NO_LOGICAL, ...)
For updates, it may receive an old tuple and a new tuple. It uses the old tuple as the lookup key when available, locates the existing tuple in the new heap, adjusts TOAST pointers if needed, and performs table_tuple_update().
The TABLE_*_NO_LOGICAL flags are important. These replay operations should not themselves be decoded as new logical changes, or the system could feed its own changes back into the stream.
find_target_tuple() is where the identity index requirement pays off. It builds scankeys from the identity index columns and uses an index scan on the new heap to find the tuple corresponding to a decoded update/delete.
Why There Are Two Catch-Up Passes
The concurrent finishing path, rebuild_relation_finish_concurrent(), applies changes twice.
First, after copying the heap and building new indexes, it flushes WAL and applies changes up to that point while still holding only the weaker lock. This minimizes the backlog.
The flush is important. The decoding worker does not consume arbitrary WAL that only exists in WAL buffers. Its WAL reader is bounded by the flushed WAL position. Before the catch-up pass, the main backend calls:
XLogFlush(GetXLogInsertEndRecPtr()); end_of_wal = GetFlushRecPtr(NULL); process_concurrent_changes(end_of_wal, &chgcxt, false);
This is the core concurrency strategy:
long phase:
weak lock
copy heap
build new indexes
catch up with decoded WAL
short phase:
strong lock
final WAL catch-up
swap files
The whole design is about making the strong-lock phase small.
Does REPACK Require wal_level = logical?
One easy trap is to equate “uses logical decoding” with “requires wal_level = logical. In this implementation, concurrent REPACK uses logical decoding internally, but it does not necessarily require the server GUC wal_level to be set to logical.
The decoding worker initializes this path in repack_setup_logical_decoding(). That function is similar to pg_create_logical_replication_slot(), but the slot it creates is private to REPACK: it is kept acquired for the duration of the operation, and it is temporary rather than persistent.
Keeping the slot acquired matters for correctness. Concurrent REPACK must decode every committed row change that happens after its initial snapshot and before the final storage swap. If the slot were released, another backend could consume from it and advance the decoding position. REPACK could then miss changes that need to be applied to the new heap.
Making the slot temporary is also intentional. A concurrent repack is not a crash-resumable operation. If the server crashes halfway through copying, decoding, applying changes, or swapping files, it is simpler and safer to discard the slot and restart the whole operation later.
The relevant setup looks like this:
CheckLogicalDecodingRequirements(true);
ReplicationSlotCreate(..., RS_TEMPORARY, ...);
EnsureLogicalDecodingEnabled();
CreateInitDecodingContext("pgrepack", ...);
Index Handling
Non-concurrent mode usually swaps the heap and then rebuilds indexes on the original logical relation.
Concurrent mode cannot wait until the strong-lock phase to build indexes, because that could take a long time. Instead, build_new_indexes() creates matching indexes on the new heap before taking AccessExclusiveLock.
Later, during the final swap, the code swaps storage for each old/new index pair. That preserves the logical identity of the original indexes while replacing their physical contents.
This is why rebuild_relation_finish_concurrent() keeps ind_oids_old and ind_oids_new in matching order.
Progress Reporting
The file also wires into PostgreSQL progress reporting through pgstat_progress_update_param(). The phases are defined in progress.h:
- sequential scan heap
- index scan heap
- sort tuples
- write new heap
- catch up
- swap relation files
- rebuild index
- final cleanup
That is a useful map for reading the code. Each phase corresponds to a major structural step in the rewrite.
Summary
A compact way to understand REPACK is this:
REPACK is a table rewrite that preserves logical identity.
Non-concurrent REPACK:
block writers/readers strongly
copy live data
swap storage
rebuild indexes
drop old storage
Concurrent REPACK:
allow writes during most work
copy data from a historic snapshot
decode changes from WAL
replay changes into the new heap
briefly block writes
replay final changes
swap heap and index storage
The most important distinction from ordinary VACUUM is that REPACK creates new storage. VACUUM mostly cleans within existing storage and usually cannot return all bloat to the filesystem. REPACK, like VACUUM FULL, rewrites the table and can physically compact it. The concurrent version adds logical decoding to make that rewrite happen with a much shorter exclusive-lock window.
From a code-analysis perspective, repack.c is a good example of PostgreSQL’s style: logical database identity lives in catalogs, physical storage can be swapped underneath it, and correctness comes from carefully combining locks, snapshots, WAL, relcache invalidation, and catalog updates.

Recent Comments