Rethinking Disaster Recovery for MPP Databases - Part 1

Nov 28

Why Massively Parallel Postgres (MPP) platforms like Greenplum, WarehousePG, and Cloudberry need both logical backups and modern Point-In-Time Recovery to meet real DR requirements.

Backup and disaster recovery for massively parallel Postgres systems like Greenplum, WarehousePG, and Cloudberry requires a different strategy than traditional single-node PostgreSQL. Logical tools such as gpbackup remain essential, but they cannot fully protect a distributed environment on their own.

For true end-to-end protection, modern MPP systems require a dual approach:

gpbackup : logical portability, compliance, table-level recoveries
PITR (GPDR or WAL-G) : full-cluster consistency, timestamp recovery, DR-ready replicas

Bottom line: Logical backups are still encouraged, but at MPP scale they must be paired with a Point-In-Time Recovery strategy if you want predictable, fast, and consistent disaster recovery.

Why Logical Backups Alone Are Not Enough

As clusters grow into the multi-terabyte or petabyte range, tools like pg_dump and even gpbackup begin to hit hard limits:

High I/O and CPU load during parallel table unloads
Lock contention that can slow or block user workloads
Long backup windows that don’t fit 24×7 ingest pipelines
No ability to restore the system to a specific timestamp
No concept of a warm standby or DR replica ready to be promoted

None of this means gpbackup is obsolete. It means that in an MPP world, logical backups are necessary but not sufficient if your goal is full cluster disaster recovery with tight RPO/RTO objectives.

A Recent Use Case

Case for GPDR was obvious for a large finance & banking customer. Their existing full logical backups of a 800 TB Greenplum data warehouse took 48 hours to complete and restore was equally as long. This exposed them minimally to 2 days of downtime during DR and significant data loss. After careful analysis and infrastructure identified to support, GPDR implementation reduced both RTO & RPO to just 15 minutes satisfying both regulatory and internal requirements including greatly reducing lock contention netting significant performance increase.

The Essential Role of gpbackup

Even in large analytics platforms, gpbackup remains indispensable:

Schema-level and table-level restores
Logical dumps for audits and long-term archives
Data portability between clusters, environments, and regions
Controlled, object-level repair operations
Compliance and retention requirements that expect logical copies

gpbackup is the logical safety net that complements PITR. It is the right tool when you need to restore just a subset of data, keep long-term archives, or move data between environments in a controlled, application-aware way.

Why Point-In-Time Recovery Becomes Mandatory in MPP Systems

Point-In-Time Recovery (PITR) addresses the biggest gaps left by logical backups:

A coordinated base backup across all segments
Continuous, low-impact archiving of Write-Ahead Logs (WAL)
The ability to rewind a cluster to a specific moment in time
Higher ingest tolerance with minimal locking overhead
Fast, repeatable cluster recovery after an incident
Support for warm or hot disaster recovery environments

In a single PostgreSQL instance, implementing PITR is relatively straightforward. In a distributed MPP environment, it is not. PITR requires:

Coordinated WAL archiving across all segments
Consistent base backups taken in a cluster-aware way
Storage throughput sufficient to keep up with WAL volume
Careful retention policies so WAL is available up to the desired RPO
Automated restore workflows so recovery is repeatable and fast

This is where GPDR and WAL-G come into play.

GPDR: Enterprise-Grade PITR for Greenplum

Greenplum Disaster Recovery (GPDR) is Broadcom’s enterprise disaster-recovery framework for Greenplum. It is built on pgBackRest, giving it a robust and battle-tested engine for backup and restore, while adding the orchestration required for an MPP system.

GPDR provides:

Parallel, coordinated base backups across the Greenplum master and all segments
Continuous WAL archiving to enable PITR
Compression and optional encryption of backup streams
Retention management for both base backups and WAL
Automated initialization of a DR cluster kept in sync with production
Repeatable, scripted recovery workflows with enterprise support

The result is a warm standby environment that can be promoted with minimal downtime when disaster strikes. However, GPDR is currently available only for Greenplum.

WAL-G: Flexible PITR for WarehousePG, Cloudberry, and Postgres Variants

For the broader MPP ecosystem, WAL-G fills the PITR gap. WAL-G is an open-source backup and WAL archiving tool that supports:

Vanilla PostgreSQL
Greenplum
Cloudberry
EDB WarehousePG
Other PostgreSQL derivatives

The community has extended WAL-G to understand Greenplum-style distributed architectures, including segment-aware backups and restores. For WarehousePG and Cloudberry deployments—where GPDR is not available—WAL-G becomes the primary option for PITR.

Compared to GPDR, WAL-G is:

Lightweight and flexible, well suited to cloud and object storage (e.g., S3)
Open-source, with a wide ecosystem and community support
More operator-driven: orchestration and DR promotion scripts are your responsibility
Less opinionated about how a DR cluster is built and maintained

WAL-G can absolutely deliver PITR and full-cluster restores, but it demands more operational discipline: coordinating segment backups, scripting recovery flows, and managing retention and promotion logic yourself.

gpbackup + PITR (GPDR or WAL-G) = Complete DR Strategy

At MPP scale, the strongest design is not an either/or decision. It is: gpbackup for logical portability plus GPDR or WAL-G for PITR and DR, depending on your platform.

Requirement	gpbackup	GPDR (Greenplum)	WAL-G (WarehousePG / Cloudberry / others)	Best Practice
Logical, portable backups	✅	❌	❌	Always run gpbackup for archives and compliance
Point-in-time recovery (PITR)	❌	✅	✅	Use GPDR or WAL-G for PITR depending on platform
Warm DR standby cluster	❌	✅	⚠️ Manual scripting required	GPDR for Greenplum; for WarehousePG/Cloudberry use WAL-G plus custom orchestration
Restore single tables or schemas	✅	❌	❌	Use gpbackup when you need object-level recovery
Full-cluster restore	⚠️ Possible but slow at scale	✅ Fast and coordinated	🟡 Depends on configuration and scripting	Prefer PITR-based restores (GPDR or WAL-G) for large clusters
Compliance archiving	✅	❌	❌	gpbackup is still required for long-term logical archives

Storage Planning: Critical for Both GPDR and WAL-G

Both GPDR and WAL-G rely heavily on the capabilities of the underlying storage and network. A PITR strategy is only as strong as the infrastructure behind it. Key questions include:

Can the backup storage deduplicate and compress incoming data efficiently?
Can it handle at least 2× the segment count in concurrent connections and streams?
Is the storage network configured with an appropriate MTU and sufficient bandwidth?
Do storage controllers have enough CPU to keep up with dedupe and compression?
Is there enough capacity for both base backups and continuous WAL retention?

Without this planning, it is easy for backup and WAL pipelines to fall behind the rate of change in the database, silently eroding your effective recovery window.

Conclusion: Modern DR Requires a Dual Approach

A complete DR plan for Greenplum-based and related MPP systems is not a single tool decision. It is a strategy:

gpbackup for logical backups, portability, and compliance
GPDR for enterprise-grade PITR and DR in Greenplum
WAL-G for PITR and DR in WarehousePG, Cloudberry, and other Postgres variants

Together, these layers provide both the granular repair that DBAs rely on day-to-day and the fast, consistent cluster-wide recovery that business continuity demands when something truly goes wrong.

Looking to Modernize Your MPP Disaster Recovery?

Whether you're running Greenplum, WarehousePG, or Cloudberry, implementing a reliable PITR + logical backup strategy requires careful planning. Our guest author and Mugnano Data Consulting can assist with:

GPDR deployment and validation
WAL-G PITR design for WarehousePG / Cloudberry
Storage and WAL throughput sizing
Backup automation and DR runbook creation

📩 Contact Greg Spiegelberg
🔧 Request a consulting engagement with MDC

DBA OperationsGreenplumWarehousePGCloudberryAutomationData ProtectionDREDB PostgresAI

Greg Spiegelberg