Rethinking Disaster Recovery for MPP Databases

Why Massively Parallel Postgres (MPP) platforms like Greenplum, WarehousePG, and Cloudberry need both logical backups and modern Point-In-Time Recovery to meet real DR requirements.

Backup and disaster recovery for massively parallel Postgres systems like Greenplum, WarehousePG, and Cloudberry requires a different strategy than traditional single-node PostgreSQL. Logical tools such as gpbackup remain essential, but they cannot fully protect a distributed environment on their own.

For true end-to-end protection, modern MPP systems require a dual approach:

  • gpbackup : logical portability, compliance, table-level recoveries
  • PITR (GPDR or WAL-G) : full-cluster consistency, timestamp recovery, DR-ready replicas
Bottom line: Logical backups are still encouraged, but at MPP scale they must be paired with a Point-In-Time Recovery strategy if you want predictable, fast, and consistent disaster recovery.

Why Logical Backups Alone Are Not Enough

As clusters grow into the multi-terabyte or petabyte range, tools like pg_dump and even gpbackup begin to hit hard limits:

  • High I/O and CPU load during parallel table unloads
  • Lock contention that can slow or block user workloads
  • Long backup windows that don’t fit 24×7 ingest pipelines
  • No ability to restore the system to a specific timestamp
  • No concept of a warm standby or DR replica ready to be promoted

None of this means gpbackup is obsolete. It means that in an MPP world, logical backups are necessary but not sufficient if your goal is full cluster disaster recovery with tight RPO/RTO objectives.

A Recent Use Case

Case for GPDR was obvious for a large finance & banking customer. Their existing full logical backups of a 800 TB Greenplum data warehouse took 48 hours to complete and restore was equally as long. This exposed them minimally to 2 days of downtime during DR and significant data loss. After careful analysis and infrastructure identified to support, GPDR implementation reduced both RTO & RPO to just 15 minutes satisfying both regulatory and internal requirements including greatly reducing lock contention netting significant performance increase.

Average Query Lock Time

The Essential Role of gpbackup

Even in large analytics platforms, gpbackup remains indispensable:

  • Schema-level and table-level restores
  • Logical dumps for audits and long-term archives
  • Data portability between clusters, environments, and regions
  • Controlled, object-level repair operations
  • Compliance and retention requirements that expect logical copies

gpbackup is the logical safety net that complements PITR. It is the right tool when you need to restore just a subset of data, keep long-term archives, or move data between environments in a controlled, application-aware way.

Why Point-In-Time Recovery Becomes Mandatory in MPP Systems

Point-In-Time Recovery (PITR) addresses the biggest gaps left by logical backups:

  • A coordinated base backup across all segments
  • Continuous, low-impact archiving of Write-Ahead Logs (WAL)
  • The ability to rewind a cluster to a specific moment in time
  • Higher ingest tolerance with minimal locking overhead
  • Fast, repeatable cluster recovery after an incident
  • Support for warm or hot disaster recovery environments

In a single PostgreSQL instance, implementing PITR is relatively straightforward. In a distributed MPP environment, it is not. PITR requires:

  • Coordinated WAL archiving across all segments
  • Consistent base backups taken in a cluster-aware way
  • Storage throughput sufficient to keep up with WAL volume
  • Careful retention policies so WAL is available up to the desired RPO
  • Automated restore workflows so recovery is repeatable and fast

This is where GPDR and WAL-G come into play.

GPDR: Enterprise-Grade PITR for Greenplum

Greenplum Disaster Recovery (GPDR) is Broadcom’s enterprise disaster-recovery framework for Greenplum. It is built on pgBackRest, giving it a robust and battle-tested engine for backup and restore, while adding the orchestration required for an MPP system.

GPDR provides:

  • Parallel, coordinated base backups across the Greenplum master and all segments
  • Continuous WAL archiving to enable PITR
  • Compression and optional encryption of backup streams
  • Retention management for both base backups and WAL
  • Automated initialization of a DR cluster kept in sync with production
  • Repeatable, scripted recovery workflows with enterprise support

The result is a warm standby environment that can be promoted with minimal downtime when disaster strikes. However, GPDR is currently available only for Greenplum.

WAL-G: Flexible PITR for WarehousePG, Cloudberry, and Postgres Variants

For the broader MPP ecosystem, WAL-G fills the PITR gap. WAL-G is an open-source backup and WAL archiving tool that supports:

  • Vanilla PostgreSQL
  • Greenplum
  • Cloudberry
  • EDB WarehousePG
  • Other PostgreSQL derivatives

The community has extended WAL-G to understand Greenplum-style distributed architectures, including segment-aware backups and restores. For WarehousePG and Cloudberry deployments—where GPDR is not available—WAL-G becomes the primary option for PITR.

Compared to GPDR, WAL-G is:

  • Lightweight and flexible, well suited to cloud and object storage (e.g., S3)
  • Open-source, with a wide ecosystem and community support
  • More operator-driven: orchestration and DR promotion scripts are your responsibility
  • Less opinionated about how a DR cluster is built and maintained

WAL-G can absolutely deliver PITR and full-cluster restores, but it demands more operational discipline: coordinating segment backups, scripting recovery flows, and managing retention and promotion logic yourself.

gpbackup + PITR (GPDR or WAL-G) = Complete DR Strategy

At MPP scale, the strongest design is not an either/or decision. It is: gpbackup for logical portability plus GPDR or WAL-G for PITR and DR, depending on your platform.

Requirement gpbackup GPDR (Greenplum) WAL-G (WarehousePG / Cloudberry / others) Best Practice
Logical, portable backups Always run gpbackup for archives and compliance
Point-in-time recovery (PITR) Use GPDR or WAL-G for PITR depending on platform
Warm DR standby cluster ⚠️ Manual scripting required GPDR for Greenplum; for WarehousePG/Cloudberry use WAL-G plus custom orchestration
Restore single tables or schemas Use gpbackup when you need object-level recovery
Full-cluster restore ⚠️ Possible but slow at scale ✅ Fast and coordinated 🟡 Depends on configuration and scripting Prefer PITR-based restores (GPDR or WAL-G) for large clusters
Compliance archiving gpbackup is still required for long-term logical archives

Storage Planning: Critical for Both GPDR and WAL-G

Both GPDR and WAL-G rely heavily on the capabilities of the underlying storage and network. A PITR strategy is only as strong as the infrastructure behind it. Key questions include:

  • Can the backup storage deduplicate and compress incoming data efficiently?
  • Can it handle at least 2× the segment count in concurrent connections and streams?
  • Is the storage network configured with an appropriate MTU and sufficient bandwidth?
  • Do storage controllers have enough CPU to keep up with dedupe and compression?
  • Is there enough capacity for both base backups and continuous WAL retention?

Without this planning, it is easy for backup and WAL pipelines to fall behind the rate of change in the database, silently eroding your effective recovery window.

Conclusion: Modern DR Requires a Dual Approach

A complete DR plan for Greenplum-based and related MPP systems is not a single tool decision. It is a strategy:

  • gpbackup for logical backups, portability, and compliance
  • GPDR for enterprise-grade PITR and DR in Greenplum
  • WAL-G for PITR and DR in WarehousePG, Cloudberry, and other Postgres variants

Together, these layers provide both the granular repair that DBAs rely on day-to-day and the fast, consistent cluster-wide recovery that business continuity demands when something truly goes wrong.

Looking to Modernize Your MPP Disaster Recovery?

Whether you're running Greenplum, WarehousePG, or Cloudberry, implementing a reliable PITR + logical backup strategy requires careful planning. Our guest author and Mugnano Data Consulting can assist with:

  • GPDR deployment and validation
  • WAL-G PITR design for WarehousePG / Cloudberry
  • Storage and WAL throughput sizing
  • Backup automation and DR runbook creation

📩 Contact Greg Spiegelberg
🔧 Request a consulting engagement with MDC

Previous
Previous

Our Blogs: A Curated Guide to Our Technical Posts

Next
Next

Greenplum Architecture Assessment Automation