Greenplum Architecture Assessment Automation

This toolset is a core component and delivery of all services performed by Mugnano Data Consulting. For Onboarding it's used to aid in the delivery of the Cluster Install Report and for Architecture Assessments it's the primary toolset used. For all other services, it's provided for our customers use.

From script sprawl to structured, drill-down diagnostics.


When I assess customer Greenplum environments, I lean on a suite of lightweight shell utilities that make deep inspection fast and repeatable. At the core is gpview.sh, an interactive catalog viewer. Around it are wrappers for scheduled health checks, single-report runs, and persistence for trending over time.

Why gpview Exists: Taming “Script Sprawl”

If you’ve operated a large-scale application on any database, you probably have a graveyard of one-off SQL scripts: a folder full of ad hoc checks, temp files, and half-remembered names. The cost isn’t just clutter, it’s time lost hunting for the right script, re-writing something you already solved, or forgetting the subtle parameters that made a query safe to run in production.

gpview.sh is the third generation of my answer to that problem. Long before Greenplum, there was oraview.ksh for Oracle7 (yes, korn shell on 7, that should tell you how long ago that was), then db2view.sh for Db2 DPF. Each iteration kept the same idea: take all those scattered scripts, convert them into well-named shell functions, and place them behind a navigable menu with sensible drill-downs. No mystery filenames. No tribal knowledge. Just a consistent way to ask deeper questions, one menu at a time.

Design principles that stuck across all 3 generations:
  • Functions, not files: every check lives as a named function (discoverable, callable, testable).
  • Menus & drill-downs: start broad, then zoom in (e.g., cluster → database → schema → table → partition, etc).
  • Plain psql: no vendor tool lock-in, easy to run anywhere you can reach the cluster.
  • Batchable: the same functions power scheduled health checks and persistence for trending.
  • Human-readable output: formatted with psql expand/alignment when that makes sense for quick triage.

The gpview Family of Scripts

All front-ends call into a shared library, gpview_funcs.sh, which hosts the actual report functions (SQL + OS checks). Each script layers a different workflow on top:

  • gpview.sh – Interactive, menu-driven browsing of Greenplum catalogs and diagnostics with drill-downs.
  • auto_health.sh – Batch health checks driven by a run-group file; packages results and can email a tarball.
  • run_report.sh – Run a single report on demand or via cron; optional email of the output.
  • persist_report.sh – Run a single report and copy results into a database table for historical trending.
  • gpview_help.sh - Each function provides a description of the report and the "why" behind the report.
  • email_diff_reports.sh (not shown) – Alerting for OS/config drift by emailing when diffs are detected. Utlizes the OS-level functions in gpview_funcs.sh

What It’s Like to Use

The interactive flow mirrors how practitioners think during a real investigation: start with a broad status view, then drill into the “why” with focused sub-reports. Examples:

  • Architecture & Config: segment layout, versions, extensions, critical parameters.
  • Reliability: mirroring state, faults, background workers.
  • Performance: skew, spill, bloat, locks, heavy queries, maintenance cadence.
  • Capacity: database/schema/table growth and hot spots.
  • OS Hygiene: kernel tuning, filesystem posture, environment drift.

Quick Starts

Interactive Exploration with gpview.sh

Fully interactive (prompts for connection info):

./gpview.sh

Skip prompts using standard psql parameters:

./gpview.sh -d mydb -U dba_user -h mdw.example.com -p 5432
# or shorthand if your env is set:
./gpview.sh mydb
Automated Health Check with auto_health.sh
./auto_health.sh \
  -d mydb \
  -g run_health_groups \
  -t ACME_CORP \
  -f /path/to/seghosts \
  --emailto ops@example.com --emaildba

Output includes OS and GP reports plus a timings log, bundled as: health_reports_ACME_CORP_mydb.tgz

Single Report (cron-friendly) with run_report.sh
./run_report.sh -d mydb -r getTableSkew -o /tmp/table_skew.txt --emailto dba@example.com

Tip: ./run_report.sh -h prints valid report names.

Persist for Trending with persist_report.sh
./persist_report.sh \
  -d mydb \
  -r getBloatSummary \
  -t analytics.dbamonitor.bloat_trend

No table specified? The script prints the report’s column headers for you.

Drift Alerts with email_diff_reports.sh

After persistence is in place, this utility checks recent vs. prior OS/config results (e.g., kernel params) and emails your DBA team if differences are found. Add it to cron to keep a watchful eye between full health checks.

Consulting Tip: Use auto_health.sh for quarterly baselines, run_report.sh for weekly spot checks, and persist_report.sh for month-over-month trending. Pair with a BI tool to visualize bloat, skew, and configuration drift.

Installation (really just placement)

No installer needed. Copy the scripts to a Linux host that can reach your Greenplum master. For the richest data, run as gpadmin (or another superuser) on the master host so file change checks (e.g., postgresql.conf, pg_hba.conf) are visible.


Previous
Previous

Rethinking Disaster Recovery for MPP Databases

Next
Next

Kickstarter: Automating Backup, Replicate & Restore in Greenplum