Rethinking Disaster Recovery for MPP Databases
Why Massively Parallel Postgres (MPP) platforms like Greenplum, WarehousePG, and Cloudberry need both logical backups and modern Point-In-Time Recovery to meet real DR requirements.
Backup and disaster recovery for massively parallel Postgres systems like Greenplum, WarehousePG, and Cloudberry requires a different strategy than traditional single-node PostgreSQL. Logical tools such as gpbackup remain essential, but they cannot fully protect a distributed environment on their own.
For true end-to-end protection, modern MPP systems require a dual approach:
- gpbackup : logical portability, compliance, table-level recoveries
- PITR (GPDR or WAL-G) : full-cluster consistency, timestamp recovery, DR-ready replicas
Why Logical Backups Alone Are Not Enough
As clusters grow into the multi-terabyte or petabyte range, tools like pg_dump
and even gpbackup begin to hit hard limits:
- High I/O and CPU load during parallel table unloads
- Lock contention that can slow or block user workloads
- Long backup windows that don’t fit 24×7 ingest pipelines
- No ability to restore the system to a specific timestamp
- No concept of a warm standby or DR replica ready to be promoted
None of this means gpbackup is obsolete. It means that in an MPP world, logical backups are necessary but not sufficient if your goal is full cluster disaster recovery with tight RPO/RTO objectives.
A Recent Use Case
Case for GPDR was obvious for a large finance & banking customer. Their existing full logical backups of a 800 TB Greenplum data warehouse took 48 hours to complete and restore was equally as long. This exposed them minimally to 2 days of downtime during DR and significant data loss. After careful analysis and infrastructure identified to support, GPDR implementation reduced both RTO & RPO to just 15 minutes satisfying both regulatory and internal requirements including greatly reducing lock contention netting significant performance increase.
The Essential Role of gpbackup
Even in large analytics platforms, gpbackup remains indispensable:
- Schema-level and table-level restores
- Logical dumps for audits and long-term archives
- Data portability between clusters, environments, and regions
- Controlled, object-level repair operations
- Compliance and retention requirements that expect logical copies
gpbackup is the logical safety net that complements PITR. It is the right tool when you need to restore just a subset of data, keep long-term archives, or move data between environments in a controlled, application-aware way.
Why Point-In-Time Recovery Becomes Mandatory in MPP Systems
Point-In-Time Recovery (PITR) addresses the biggest gaps left by logical backups:
- A coordinated base backup across all segments
- Continuous, low-impact archiving of Write-Ahead Logs (WAL)
- The ability to rewind a cluster to a specific moment in time
- Higher ingest tolerance with minimal locking overhead
- Fast, repeatable cluster recovery after an incident
- Support for warm or hot disaster recovery environments
In a single PostgreSQL instance, implementing PITR is relatively straightforward. In a distributed MPP environment, it is not. PITR requires:
- Coordinated WAL archiving across all segments
- Consistent base backups taken in a cluster-aware way
- Storage throughput sufficient to keep up with WAL volume
- Careful retention policies so WAL is available up to the desired RPO
- Automated restore workflows so recovery is repeatable and fast
This is where GPDR and WAL-G come into play.
GPDR: Enterprise-Grade PITR for Greenplum
Greenplum Disaster Recovery (GPDR) is Broadcom’s enterprise disaster-recovery framework for Greenplum. It is built on pgBackRest, giving it a robust and battle-tested engine for backup and restore, while adding the orchestration required for an MPP system.
GPDR provides:
- Parallel, coordinated base backups across the Greenplum master and all segments
- Continuous WAL archiving to enable PITR
- Compression and optional encryption of backup streams
- Retention management for both base backups and WAL
- Automated initialization of a DR cluster kept in sync with production
- Repeatable, scripted recovery workflows with enterprise support
The result is a warm standby environment that can be promoted with minimal downtime when disaster strikes. However, GPDR is currently available only for Greenplum.
WAL-G: Flexible PITR for WarehousePG, Cloudberry, and Postgres Variants
For the broader MPP ecosystem, WAL-G fills the PITR gap. WAL-G is an open-source backup and WAL archiving tool that supports:
- Vanilla PostgreSQL
- Greenplum
- Cloudberry
- EDB WarehousePG
- Other PostgreSQL derivatives
The community has extended WAL-G to understand Greenplum-style distributed architectures, including segment-aware backups and restores. For WarehousePG and Cloudberry deployments—where GPDR is not available—WAL-G becomes the primary option for PITR.
Compared to GPDR, WAL-G is:
- Lightweight and flexible, well suited to cloud and object storage (e.g., S3)
- Open-source, with a wide ecosystem and community support
- More operator-driven: orchestration and DR promotion scripts are your responsibility
- Less opinionated about how a DR cluster is built and maintained
WAL-G can absolutely deliver PITR and full-cluster restores, but it demands more operational discipline: coordinating segment backups, scripting recovery flows, and managing retention and promotion logic yourself.
gpbackup + PITR (GPDR or WAL-G) = Complete DR Strategy
At MPP scale, the strongest design is not an either/or decision. It is: gpbackup for logical portability plus GPDR or WAL-G for PITR and DR, depending on your platform.
| Requirement | gpbackup | GPDR (Greenplum) | WAL-G (WarehousePG / Cloudberry / others) | Best Practice |
|---|---|---|---|---|
| Logical, portable backups | ✅ | ❌ | ❌ | Always run gpbackup for archives and compliance |
| Point-in-time recovery (PITR) | ❌ | ✅ | ✅ | Use GPDR or WAL-G for PITR depending on platform |
| Warm DR standby cluster | ❌ | ✅ | ⚠️ Manual scripting required | GPDR for Greenplum; for WarehousePG/Cloudberry use WAL-G plus custom orchestration |
| Restore single tables or schemas | ✅ | ❌ | ❌ | Use gpbackup when you need object-level recovery |
| Full-cluster restore | ⚠️ Possible but slow at scale | ✅ Fast and coordinated | 🟡 Depends on configuration and scripting | Prefer PITR-based restores (GPDR or WAL-G) for large clusters |
| Compliance archiving | ✅ | ❌ | ❌ | gpbackup is still required for long-term logical archives |
Storage Planning: Critical for Both GPDR and WAL-G
Both GPDR and WAL-G rely heavily on the capabilities of the underlying storage and network. A PITR strategy is only as strong as the infrastructure behind it. Key questions include:
- Can the backup storage deduplicate and compress incoming data efficiently?
- Can it handle at least 2× the segment count in concurrent connections and streams?
- Is the storage network configured with an appropriate MTU and sufficient bandwidth?
- Do storage controllers have enough CPU to keep up with dedupe and compression?
- Is there enough capacity for both base backups and continuous WAL retention?
Without this planning, it is easy for backup and WAL pipelines to fall behind the rate of change in the database, silently eroding your effective recovery window.
Conclusion: Modern DR Requires a Dual Approach
A complete DR plan for Greenplum-based and related MPP systems is not a single tool decision. It is a strategy:
- gpbackup for logical backups, portability, and compliance
- GPDR for enterprise-grade PITR and DR in Greenplum
- WAL-G for PITR and DR in WarehousePG, Cloudberry, and other Postgres variants
Together, these layers provide both the granular repair that DBAs rely on day-to-day and the fast, consistent cluster-wide recovery that business continuity demands when something truly goes wrong.
Looking to Modernize Your MPP Disaster Recovery?
Whether you're running Greenplum, WarehousePG, or Cloudberry, implementing a reliable PITR + logical backup strategy requires careful planning. Our guest author and Mugnano Data Consulting can assist with:
- GPDR deployment and validation
- WAL-G PITR design for WarehousePG / Cloudberry
- Storage and WAL throughput sizing
- Backup automation and DR runbook creation
📩 Contact Greg Spiegelberg
🔧 Request a consulting engagement with MDC

