Wal Fetch Performance: Tuning WAL Recovery Speed

Recovery time objective (RTO) is a critical metric for production databases. Slow wal fetch performance is often the bottleneck in PostgreSQL recovery scenarios, especially when replaying hundreds of thousands of WAL segments from S3 or other cloud storage. Proper tuning can reduce recovery time from hours to minutes.

Baseline measurement: before tuning, measure your current wal-fetch throughput by logging the time between consecutive WAL file fetches during recovery. PostgreSQL logs the restore_command timing which gives you a baseline throughput figure.

WALG_DOWNLOAD_CONCURRENCY is the single most impactful setting for wal-fetch performance. Increasing this from the default to 4, 8, or even 16 allows multiple WAL segments to be downloaded simultaneously, dramatically increasing throughput on high-bandwidth connections.

WAL prefetching eliminates download latency for sequential WAL replay by fetching the next N WAL segments while the current one is being applied. Check that /.wal-g/prefetch/ is populated during recovery—if it's empty, prefetching may not be working.

Storage class matters: if your WAL archives are stored in S3 Glacier or similar infrequent-access storage, retrieval latency will be high. Keep recent WAL archives in S3 Standard and only migrate older archives to cheaper storage tiers.

Network optimization: run WAL-G and PostgreSQL in the same AWS region as your S3 bucket to eliminate cross-region data transfer latency. Enabling S3 Transfer Acceleration can further improve throughput for geographically distributed setups.

“A 4x increase in WALG_DOWNLOAD_CONCURRENCY can reduce a 4-hour wal fetch recovery to under 60 minutes on a well-provisioned S3 connection.”

Step-by-Step: Measure Baseline

Follow these steps to implement wal fetch in your PostgreSQL environment effectively.

Step 1

Measure Baseline

Enable verbose WAL-G logging with WALG_LOG_LEVEL=DEVEL and measure current wal-fetch throughput before applying any tuning.

Step 2

Increase Concurrency

Set WALG_DOWNLOAD_CONCURRENCY=8 and re-run recovery. Monitor if S3 throughput increases proportionally or if you hit S3 rate limits.

Step 3

Verify Prefetch

During recovery, check ls /.wal-g/prefetch/ to confirm files are being prefetched ahead of the current WAL position.

Step 4

Profile and Iterate

Compare recovery start-to-finish times across different concurrency settings to find the optimal value for your database size and network.