When “Nothing Changed” Breaks Your Greenplum Performance

If you run Greenplum and have ever said, “Nothing changed, but now Informatica jobs are slow,” this post is for you.

Recently we worked with a customer running Greenplum on bare metal who experienced significant performance degradation in ETL workloads, especially Source Qualifier heavy Informatica jobs. The slowdown began shortly after what was described as a minor infrastructure change.

  • Nothing was down
  • No application errors
  • CPU, disk, and memory looked normal

But throughput dropped in a noticeable way.


The symptoms pointed in the wrong direction

At first glance, the environment looked healthy:

  • Client connectivity was stable
  • Simple queries ran quickly
  • No storage bottlenecks were obvious
  • No crashes or fatal errors

However, under sustained load, data movement queries slowed significantly. Subtle warnings began appearing in the database logs. They were not fatal errors, so they were easy to overlook.

The system was not failing outright.

It was retrying internally.

And those retries add up.


Why ETL workloads feel this first

Large ETL jobs are often the first to expose backend instability in distributed databases.

They:

  • Pull large result sets
  • Trigger redistribution across segments
  • Stress internal communication paths much more than simple queries

When the backend network is even slightly unstable, ETL is usually the first workload to show it.


The layer most teams do not monitor

Most monitoring tools focus on:

  • Client to database latency
  • CPU and memory
  • Disk I/O

What they often do not clearly expose is:

  • Segment to segment communication
  • Packet handling inside the interconnect network
  • Internal retries that stretch execution time

That is where this issue lived.


“Disabling the monitoring agent fixed it”

One interesting detail was that disabling an observability agent appeared to improve performance.

It did not fix the root cause.

What it did was reduce system pressure just enough to hide the underlying weakness.

That distinction is important. If the architecture remains sensitive, the problem can return during:

  • Peak ETL windows
  • Higher data volumes
  • Future infrastructure changes

The real takeaway

Distributed databases like Greenplum are highly sensitive to backend network behavior, especially when:

  • Hosts have multiple network interfaces
  • Internal traffic uses UDP
  • Firewalls or security tooling are introduced
  • Routing or MTU settings change

These platforms do not always fail loudly.

Sometimes they just get slower.


Does this sound familiar?

If you are seeing any of the following, it may be worth a deeper review:

  • ETL jobs slowing after infrastructure changes
  • Performance issues that disappear when load is reduced
  • Interconnect warnings in logs that are not fatal
  • Monitoring tools show healthy infrastructure but applications disagree
  • Problems that appear only under concurrency

Need a second set of eyes?

If you are running Greenplum or a Greenplum compatible MPP platform and performance does not align with system metrics, we help customers diagnose and correct exactly these types of issues.

Contact Mugnano Data Consulting to discuss your environment.

Contact Us

Previous
Previous

Our Blogs: A Curated Guide to Our Technical Posts

Next
Next

Rethinking Disaster Recovery for MPP Databases - Part 2