Kickstarter: Automating Backup, Replicate & Restore in Greenplum

GreenplumGreenplum ToolsAutomationDBA OperationsDell Data Domain

Oct 11

This toolset is a core component of Mugnano Data Consulting’s DBA Enablement service offering and ships in our DBA Operations Kickstarter suite, providing turnkey practices and tooling for Greenplum operations.

Disaster Recovery isn’t just “take a backup.” At MPP scale, you need a repeatable end-to-end flow that tags each backup, replicates it to DR, and restores it with minimal coordination and downtime. This post walks through the DBA Operations Kickstarter framework that wraps gpbackup, gpbackup_manager, and gprestore into a fully automated pipeline.

Why this matters: We tag each backup with a timestamp, then use that tag to drive replication and restore so the entire DR flow is deterministic and scriptable.

High-Level Approach

Backup: Separate METADATA (catalog) from DATA to reduce lock time, then parallelize data sets.
Replicate: Ship artifacts to DR (e.g., Data Domain) using configurable interfaces.
Restore: Read the latest “marker file” for the backup timestamp and rehydrate the target DB.

Backup → Replicate → Restore (orchestrated by marker files)

Why Wrapper Scripts Are Necessary

Vendor tools are excellent, but at scale you hit operational realities the wrappers solve:

Parallelism vs. Single-File Backups: Backup appliances (e.g., Data Domain) perform best with fewer, larger files, so the single-data-file option is preferred. That disables the built-in parallel jobs flag—our wrappers reintroduce safe parallelism by splitting work into logical backup tags (e.g., by schema) and coordinating them via auto-generated Makefiles.
Timestamp Collisions: gpbackup uses timestamps as the backup key. Parallel runs can collide if they start in the same second. The wrappers serialize tag start times (a predecessor check + “running” marker) to guarantee unique timestamps and to propagate the correct TS downstream to replicate/restore.
Thread Limits & Race Conditions: Data Domain can have different limits for write/read/replication threads. Wrappers decouple backup, replicate, and restore into independent atomic units so each stage can saturate its own thread pool—and add backoff/retry to avoid rare collisions.
Deterministic DR: Marker files (<ts>|<location>|<DD|DIR>) make the entire chain deterministic. Replicate/restore run in wait mode and wake exactly when the right marker appears.

Key Components

1) Environment Configuration

Global and per-database shell vars live under ~/gptools/backup_restore/conf:

env_vars.sh – cluster-wide defaults (paths, marker dir, plugin slots).
<dbname>_vars.sh – DB-specific overrides (interfaces, target DB names, etc.).

📂 Sample: minimal env_vars.sh

# Base source dir (verify after install)
SRC_LOC="/home/gpadmin/gptools/backup_restore"

# Marker files (per-process subdirs auto-created)
MARKER_FILE_DIR="/home/gpadmin/marker_files"

# Optional: up to 4 DD plugin slots (mapped by interface_number)
DDCFG_1="$SRC_LOC/conf/ddboost-config-dev_1-dd_dev-test_1.yml"
DDCFG_2=""
DDCFG_3=""
DDCFG_4=""

2) Data Domain Integration

If you use Dell EMC Data Domain, YAML plugin files define local and remote targets, streams, and optional “restore-from” settings. Passwords are encrypted with gpbackup_manager encrypt-password (stored via pgcrypto), and a helper (gen_backup_ymls.sh) generates consistent YAMLs per environment.

📂 Template: replicated DD Boost config (excerpt)

executablepath: /usr/local/greenplum-db/bin/gpbackup_ddboost_plugin
options:
  hostname: dd-prod-01
  username: ddboost
  password: <md5-encrypted>
  password_encryption: "on"
  storage_unit: GPDB
  directory: ddb_gp_gpadmin_21
  replication: off            # use gpbackup_manager for replication
  replication_streams: 4
  remote_hostname: dd-dr-01
  remote_username: ddboost
  remote_password: <md5-encrypted>
  remote_password_encryption: "on"
  remote_storage_unit: GPDB
  remote_directory: ddb_gp_gpadmin_21

Password tip: encrypt once, then copy the hidden .encrypt key from $MASTER_DATA_DIRECTORY to all hosts that must decrypt it.

3) Backup Set Configuration (SQL)

Declare how to split and parallelize backups in a single table dbaconfig.backup_config. The framework auto-generates Makefiles on each run—no manual edits required.

📂 Table definition (excerpt)

CREATE TABLE dbaconfig.backup_config (
  backup_type varchar(25) NOT NULL DEFAULT 'all',
  backup_tag  varchar(25) NOT NULL,
  backup_db   varchar(25) NOT NULL,
  backup_sequence int NOT NULL,
  schema_list text,
  table_file_path text,
  interface_number int DEFAULT 1,
  restore_to_db varchar(25),
  additional_gpbackup_parms text,
  additional_gprestore_parms text,
  CONSTRAINT chk_backup_type CHECK (backup_type IN ('all','daily','weekly','monthly','yearly')),
  CONSTRAINT backup_config_pk PRIMARY KEY (backup_type,backup_tag, backup_db),
  CONSTRAINT chk_interface CHECK (interface_number IN (1,2,3,4,5,6))
) DISTRIBUTED BY (backup_type, backup_tag, backup_db);

4) Marker Files (Orchestration)

Each backup writes a “marker” with the backup timestamp and location: <timestamp>|<plugin-or-dir>|<DD|DIR>. Replication and restore read these markers to know exactly what to act on—enabling hands-off DR chaining.

Running the Pipeline

Backup

# Backup specific type/db; runs METADATA, then parallel DATA sets
./backup.sh -t <all|daily|weekly|monthly|yearly> -d <dbname>

Replication

# Wait mode: sits idle until a new marker arrives, then replicates that TS
./replicate.sh -t <type> -d <dbname> --wait

Restore

# Wait mode: restores when backup marker arrives; runs METADATA, then DATA
./restore.sh -t <type> -d <dbname> --wait

📂 Full refresh option (drop DB & globals, then restore)

# DANGER: Drops DB and global objects on the target before restoring.
./restore.sh -d gpadmin -n gpadmin --wait --drop-db-and-globals --analyze

Process Visuals

Backup/Replicate/Restore Swimlane — Swimlane view of backup, replication, and restore with marker coordination.

Backup Tag Drilldown — Swimlane view of backup step 5

Monitoring & Control

Use dr_process_manager.sh to view running PIDs, kill/restart by tag, and launch ad-hoc schema/table restores.

📂 Common actions

See running PIDs for backups/replicates/restores
Kill or restart by tag
Restore a specific schema or table interactively

Scheduling & Validation

Run validate_backup_config.sh before scheduled windows.
Email results via rpt_validate_backup_config.sh.
Include the marker archive directory in your retention policy.

Conclusion

Automating backup, replication, and restore turns DR from a manual “hope it works” exercise into a repeatable, observable pipeline. The wrapper scripts give you determinism (timestamped markers), performance (parallel tags with single-file efficiency), and control (Makefile orchestration and process management) on top of Greenplum’s native tools.

Faster & safer: Metadata/Data split minimizes lock time while keeping restores predictable.
Production-ready: Central SQL config (backup_config) + marker files + validators.
Operable: dr_process_manager.sh to monitor, kill, or restart by tag when you need to intervene.

Contact Mugnano Data Consulting to schedule a consultation.

DBA OperationsAutomationGreenplumSynxDBWarehousePGEDB PostgresAIData Protection

Louis Mugnano https://www.mugnanodc.com