Wal Fetch Performance: Tuning WAL Recovery Speed
Recovery time objective (RTO) is a critical metric for production databases. Slow wal fetch performance is often the bottleneck in PostgreSQL recovery scenarios, especially when replaying hundreds of thousands of WAL segments from S3 or other cloud storage. Proper tuning can reduce recovery time from hours to minutes.
Baseline measurement: before tuning, measure your current wal-fetch throughput by logging the time between consecutive WAL file fetches during recovery. PostgreSQL logs the restore_command timing which gives you a baseline throughput figure.
WALG_DOWNLOAD_CONCURRENCY is the single most impactful setting for wal-fetch performance. Increasing this from the default to 4, 8, or even 16 allows multiple WAL segments to be downloaded simultaneously, dramatically increasing throughput on high-bandwidth connections.
WAL prefetching eliminates download latency for sequential WAL replay by fetching the next N WAL segments while the current one is being applied. Check that /.wal-g/prefetch/ is populated during recovery—if it's empty, prefetching may not be working.
Storage class matters: if your WAL archives are stored in S3 Glacier or similar infrequent-access storage, retrieval latency will be high. Keep recent WAL archives in S3 Standard and only migrate older archives to cheaper storage tiers.
Network optimization: run WAL-G and PostgreSQL in the same AWS region as your S3 bucket to eliminate cross-region data transfer latency. Enabling S3 Transfer Acceleration can further improve throughput for geographically distributed setups.
“A 4x increase in WALG_DOWNLOAD_CONCURRENCY can reduce a 4-hour wal fetch recovery to under 60 minutes on a well-provisioned S3 connection.”
Step-by-Step: Measure Baseline
Follow these steps to implement wal fetch in your PostgreSQL environment effectively.
Measure Baseline
Increase Concurrency
Verify Prefetch
Profile and Iterate
Related topics: wal fetch performance, WAL recovery speed, WALG_DOWNLOAD_CONCURRENCY