Kickstarter: Automating Partition Maintenance in Greenplum

This toolset is a core component of Mugnano Data Consulting’s DBA Enablement service offering and ships in our DBA Operations Kickstarter suite, providing turnkey practices and tooling for Greenplum operations.

Partitioning is a cornerstone of scalable analytics in Greenplum Database. By splitting large tables into manageable ranges, query pruning lowers scan costs and improves performance. In addition, Partitioning optimizes data retention “rolling window” designs.

The hardest part of partitioning isn’t design, it’s keeping partitions current: adding new ranges, removing expired ones, and externalizing or restoring historical data. In many architecture assessments, utilizing our health check scripts, I’ve found 70+% of a table’s rows sitting in the default partition because teams stopped adding date partitions (See how to audit your default partitions →). That effectively disables pruning and erases the performance benefits of partitioning altogether.

The kickstarter Partition Maintenance toolset is a set of Bash/Python utilities that operationalizes this lifecycle for Greenplum, now with a fully table-driven configuration model plus validation and alerting so DBAs can manage dozens (or hundreds) of tables predictably.

Default partitions are a safety net, not a data lake :-) If most of your data lands in DEFAULT, you’ve lost partition pruning.
— Architecture Assessment Document

Supported Partition Designs

  • Range partitions on DATE, TIMESTAMP, or NUMERIC columns

  • Range with list sub-partitions (template-driven)

  • Default/catch-all partition support

  • List partitions with range sub-partitions

  • Externalized oldest partitions for archival

The functionality in this toolset makes it possible to automate the conceptual “Polymorphic Design” Greenplum has encouraged for many years. A Polymorphic design architecture allows you to configure a single table utilizing multiple physical design formats as shown below:

Core Functionality

  • ADD_HIGH_PARTS: Calculates and add new “high” partitions (future ranges with boundary support)

  • ADD_LOW_PARTS: Backfill historical ranges

  • RMV_PARTS: Remove oldest partitions (and sub-partitions) with boundary support

  • EXTERNALIZE_PARTS / EXTERNALIZE_SPECIFIC_PART: Detach to external tables (e.g., gpfdist)

  • INTERNALIZE_SPECIFIC_PART: Re-attach prior external partitions

Table-Driven Configuration

All partition policies reside in a kickstarter configuration table called dbaconfig.part_maint_config. Each row defines the target table, action, number of partitions to add/remove, and the cadence/threshold using intuitive interval clauses. This makes the process self-documenting and easy to audit.

-- Example: ensure 4 future monthly partitions, only if less than 5 months exist
INSERT INTO dbaconfig.part_maint_config (
   maint_db 
  ,schema_nm 
  ,tbl_nm 
  ,action 
  ,num_parts 
  ,interval_clause
  ,interval_check
) VALUES (
   'gpadmin'
  ,'dbamonitor'
  ,'os_check'
  ,'ADD_HIGH_PARTS'
  ,4
  ,'1 month'
  ,'5 months'
);

Execution Orchestrator (Table-Driven Runner)

A lightweight bash shell runner reads the configuration table and invokes gpPartMaint for each configured row. It batches actions per database, executes them in order, then produces and emails a partition report to the DBA team. This is ideal for a weekly cron or other scheduler system scheduled job.

# Pseudo-flow of the runner
for db in (SELECT DISTINCT maint_db FROM dbaconfig.part_maint_config)
  SELECT per-table options FROM dbaconfig.part_maint_config WHERE maint_db = db;
  for each row: 
      gpPartMaint.sh -cfg gpPartMaint_cfg -db  -schema ... -table ... -action ...
  

gpPartMaint.sh -cfg gpPartMaint_cfg -db -action REPORT_PARTS -rpt_file_name email _partition_report.out to DBA contacts

Validation & Alerting

A companion validation script cross-checks the set of range-partitioned tables against the configuration, flagging missing or invalid entries. The results can be surfaced via email to your DBA team and/or application teams to ensure every partitioned table is either configured or explicitly acknowledged.

Conclusion

The Partition Maintenance toolset elevates partition maintenance from manual DDL to a repeatable platform: a central configuration table defines policy, the orchestrator runs actions for every entry, and validation & alerting keep governance tight. As part of our DBA Enablement methodology and DBA Operations Kickstarter suite, this approach accelerates time-to-value for Greenplum operations while shrinking DBA toil.

If you're interested in seeing this in action or exploring how it could improve your DB Operations, contact Mugnano Data Consulting to schedule a demo.

Previous
Previous

Our Blogs: A Curated Guide to Our Technical Posts

Next
Next

My Best Friend Lambchop 🐑🐾