PostgreSQL Write-Ahead-Logging(WAL) Archiving Functionality

 To recover from PostgreSQL Database crashes Postgres provides WAL archiving functionality from which a standby database can be created by replaying the WAL logs in the recovery site.This interm helps in recovering the database to a particular point in time which is termed as PITR(Point-In-Time_Recovery).

WAL files contain the transaction logs from which the database operation can be replayed.Whenever a database operation is triggered a WAL record is created first and then the corresponding data is flushed to disk. In case of any crash, the lost data is recreated by replaying the WAL record.


To enable WAL recovering below configuration needs to be configured in postgresql.conf file


wal_level = replica

archive_mode=true

archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' # Unix

archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"' # Windows

archive_library= custom_script.sh %p #custom binary



Archived WAL segment files are used in Continuous and PITR recovery.As shown in above diagram once the postgres database is configured with archive command. the archiver process will call the configured archive command whenever a segment wal file fills with wal records or wal file is switched.

WAL switch occurs with any of the below cases
1. Archive timeout 
2. pg_switch_wal()
3. checkpoint occurs
4. timeline switch occurs

Once WAL file is switched, .ready file is created in archive_status directory to indicate that this wal file can be achived. The archiver process archives the wal file and upon successful archive the .ready file is renamed to .done to indicate wal file is successfully archived.


Once wal file is archived the auto vaccum process will recycle the wal file based on configured "max_wal_segment_files".


At the recovery side `restore _command` needs to be configured to pull those archived files into postgres pg_wal directory for wal record replay.Before this a full backup  needs to be taken to create the recovery postgres cluster.the wal replay happens till the configured "recovery_target" which can be a restore point, latest, time , xid.Once the recovery cluster replays all the wal record the cluster can be used for database operation.

Some interesting shielding against various corner cases are below

πŸ‘‰ What happens if the wal file is archived but host crashes before renaming the file from .ready to .done ?  What happens even if file is changed by dirty buffer is not flushed to disk and host crashes ? 

    static void pgarch_archiveDone(char *xlog) handles the successful archive indication by renaming the .ready file to .done using rename() api. The archive command handles any duplicate wal file archive scenario. The better option is to fsync() the directory before recycling wal file / removing wal file.The checkpointer process handles this.

πŸ‘‰ How Archiver process get to know that wal file is ready to archive ? How WaitLatch() does spurious wakeup ?

    Archiver process(copy loop) peridically runs for every 1 min(60 sec) or when archiver latch is set and check for .ready file in archive_status directory.The timeout defined is PGARCH_AUTOWAKE_INTERVAL. This has been done to reduce the power usage.  void SetLatch(Latch *latch) and void PgArchWakeup(void) wakesup the archiver.These mechanism helps in reducing unwanted wakeup or increasing sleep/hilbernation time for power saving.

Previously the backend needs to signal the archiver if it sees any wal files to archive.This has been optimized out by making it an auxiliary process where a latch is used to signal archiver process.

Comments

Popular posts from this blog

RCU Kernel Implementation

Linux FTRACE setup using Terminal and GUI Application created from Python Framework