Data Modeling Tool for Greenplum
Ten to fifteen years ago, the Data Lake promised to centralize your data and solve all your analytical problems. Your organization likely bought into the idea. But did it deliver?
Eventually, many teams realize that truly understanding data, and building meaningful models, requires a top-down approach, one that reflects the business itself rather than just the underlying source systems. In this post, I’ll share how Mugnano Data Consulting has helped customers make this shift, and how Greenplum’s advanced features can unlock the performance they’ve been expecting from the platform.
One proven approach is the Kimball Bus Architecture combined with dimensional modeling. However, Greenplum users often face a key challenge: most data modeling tools don’t fully support Greenplum’s rich feature set. As a result, models become disconnected from development, just static documentation, outdated and unused.
This post introduces DbSchema, a data modeling tool I found effective for Greenplum. I’ll show how I worked with the vendor to make Greenplum a first-class citizen in the platform, supporting logical and physical design as well as DDL generation. I’ll also share in a future post how we’ve extended the tool, using our own dimensional modeling utilities, to deliver integrated, model-driven development for both business and database teams.
Introduction
This guide provides a comprehensive walkthrough for using DbSchema as a data modeling tool tailored for the Greenplum database. Given the limited native support for Greenplum in most modeling tools, DbSchema, with its extensibility and vendor responsiveness, has shown to be an ideal solution.
Starting from version 9.6.3, DbSchema officially supports Greenplum, incorporating enhancements suggested during our evaluation. This document aims to:
Demonstrate how to configure DbSchema for Greenplum.
Showcase how to leverage its extensibility features for automating DDL and DML code generation.
A future blog post will show the integration with the Mugnano Data Consulting dimension utilities framework for dimension management using the extensibility features of the DbSchema product.
Pre-requisite setup
To get started, ensure the following:
1) DbSchema Installation: Download and install DbSchema from DbSchema Official Website.
2) Greenplum Properties File: Obtain the greenplum.properties
file customized for this guide if you aren’t using a DbSchema version that supports Greenplum. You can also make a copy of this properties file and add cloudberry or synxdata since these databases are greenplum compatible currently. Please reach out via our contact page if you need the file.
3) Steps to Import Settings:
Configuring Greenplum-Specific Settings
DbSchema allows customization to accommodate Greenplum-specific constraints and features. These customizations are all in the Database Specific settings.
Behavior->Virtualization Adjustments
Virtualization lets you create model constraints (e.g., primary keys or foreign keys) without applying them in the database. This is essential as:
Greenplum does not support foreign keys.
Primary keys are only supported on Heap tables in versions prior to 7, with Append-Optimized (AO) table support added in version 7.
SQL Syntax Adjustments
DbSchema's extensibility enables dynamic DDL generation based on table attributes:
1. Primary Keys:
Heap tables: Primary key creation is included.
AO tables (pre-v7): Primary key creation is commented out.
2. Foreign Keys:
All foreign key definitions are commented out in generated DDL.
3. Custom Table Options:
Set Greenplum-specific attributes such as storage options, distribution methods, and partitioning.
4. Create Table:
The Create Table section is where extensibility code resides to bring everything together and generate the DDL and code desired and specified by the Data Modeler. The extensibility code in the DbSchema tool uses the groovy programming language and a published API defined in the documentation here: https://dbschema.com/javadoc/index.html. A future blog post specific to the Mugnano Data Consulting Type-2 dimension utilities will provide an example of how this extensibility code was exploited to implement the full solution.
Continue the journey
Check out how we’ve leveraged the DbSchema tool along with custom code generation techniques for dimensional modeling engagements via the next blog: