JVM memory bug in HMC V9R2M950

Feb 12, 2021

Share this:

We had a customer run into this bug. Their HMC has 7 POWER9 Managed systems and 150 LPARs with Simplified Remote Restart enabled. This is resulting in rebooting the HMC about every 10 days while on Details below came from the following link. So always check the link for updated information.

https://www.ibm.com/support/pages/node/6398722

Problem

Navigating the HMC Enhanced UI can result in the page displaying the following messages:

Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /ui/sfp/.

Reason: Error reading from remote server

Symptom

The HMC Enhanced UI becomes unusable soon after a reboot of the HMC with only a few hours or a few days of run time.  Managing virtual i/o servers, partitions and managed systems becomes impossible once the “Proxy Error” is returned.

Typically, the symptom is reported after upgrading an existing HMC to V9R2M950 and the problems begin. However, any scratch install or new install of V9R2M950 can exhibit the same problems.

Other related SRCs can also report on the HMC:

E212E116: exceeded the number of threads
E332FFFF: Java dump posted
E23D040C: [*PCERROR-D] core dump of a process
E23D0503: core dump of a process
E3D46FFF: call home exception

Cause

The core JVM is running out of memory due to the enablement of the Simplified Remote Restart capability for some or all partitions.  The more managed systems being managed and the more partitions with the feature enabled the faster the JVM runs out of memory.

Environment

7063-CR1
Virtual Appliance for x86
Virtual Appliance for ppc
HMC Version 9 Release 2 M950

Diagnosing The Problem

Anytime the “Proxy Error” is returned at V9R2M950 after some uptime following a reboot of the HMC confirms this problem as the issue.

Resolving The Problem

The workaround is to reboot the HMC whenever the “Proxy Error” is received, providing relief for some time until the JVM runs out of memory again.  Disabling Simplified Remote Restart across the entire customer environment is another workaround to avoid the reboots.

Reinstalling the HMC will not resolve the cause of the problem.

An official fix is being developed to provide on fix central for this issue in a February 2021 PTF.