Automating Agile Data Onboarding with Greenplum Sailfish
At Mugnano Data Consulting, we help clients unlock agility in their data workflows. One example is Greenplum Sailfish, a lightweight orchestration layer that bridges the gap between ad hoc analyst workflows and structured data pipelines. I originally addressed this need back in 2018 in a Pivotal-branded whitepaper that demonstrated a similar architecture using AWS-native components. However, the need itself was never cloud-specific. Since then, many of the on-prem customers I’ve worked with have expressed the same challenges. This blog revisits that design, but reimplements the orchestration using Control-M and Bash scripting to deliver the same agile experience in an on-prem Greenplum environment.
The Challenge
Data analysts often need a fast and reliable way to analyze CSV files without relying on long ETL development cycles or database administrator intervention. Traditional ingestion pipelines are optimized for recurring jobs, not for ad hoc exploration.
The Solution: Greenplum Sailfish
Greenplum Sailfish is a simple, on-prem solution built to enable analysts to self-serve CSV file ingestion into Greenplum Database for immediate analysis. It combines:
Control-M for job orchestration
Linux-based scripting for OS and Database level actions
Greenplum’s external tables for schema-on-read access
How It Works: Step-by-Step Workflow
Drop Your File: The analyst places a ‘.csv’ file into the designated Sailfish_dropbox folder.
File Detection: A Control-M File Watcher detects the new file and extracts the filename for further automation.
Trigger Processing Job: Control-M calls the Sailfish OS-level job with the file name and action `genFileExternalTable`.
Move and Prepare File: The Sailfish script moves the file into a sandbox directory and prepares it for external table creation.
Create External Table + View: Sailfish generates a Greenplum external table (‘ext_<filename>’) and a view (‘vw_<filename>’) to provide structured access to the file’s contents.
Notify the User: An email is sent to the analyst with the name of the view, ready to be queried using any reporting tool (SQL, Tableau, etc.).
Perform Analysis: Analysts can now query and analyze the data on-demand, no waiting for pipelines or loading delays.
Clean-Up: Once finished, the analyst deletes the file from the sandbox directory.
Auto-Teardown: The system detects the file deletion and automatically drops the external table and view from Greenplum.
Key Benefits
Zero DBA Involvement – Enables self-service file loading for analysts.
Fast Turnaround – From file drop to data availability in minutes.
No Data Movement – Uses external tables, so data stays in-place.
Auditable and Maintainable – Each step is logged, versioned, and repeatable.
When to Use Sailfish
Greenplum Sailfish is ideal for:
Prototyping and exploratory analysis
One-time loads and temporary staging
Teams that want agility without compromising governance
If you're interested in implementing agile ingestion workflows like Sailfish in your Greenplum or Greenplum-compatible environment, visit us at https://www.mugnanodc.com/contact. We’ll help you design and automate the perfect solution for your team.