HPE Cray EX HPC Firmware Pack Installation Guide

1 Copyright and Version

© Copyright 2022-2024 Hewlett Packard Enterprise Development LP. All third-party marks are the property of their respective owners.

HFP-DOCS: 24.10.1-9

Doc git hash: 797fe3e0d7379a3dca10cb1fc0a8e6f77ffb9daa

Generated: Mon Nov 04 2024

2 Release Information

2.1 Overview

This document uses the terms “install”, “update”, and “upgrade” interchangeably as the HFP procedure is the same in each scenario.

When installing the HPC Firmware Pack (HFP) on a system managed by Cray System Management (CSM), refer to the HPE Cray EX System Software Stack Installation and Upgrade Guide for CSM for a high-level overview of the installation and upgrade workflow for all products. It also includes the software and hardware compatiblity matrix, third-party product documentation links, and cross-product dependencies.

Installing HFP is primarily accomplished by executing Install and Upgrade Framework (IUF) commands as documented in the Cray System Management documentation and described in this guide in sections Install or Upgrade HPC Firmware Pack and IUF Stage Details for HFP.

2.2 Product Details

The HPE Cray EX HPC Firmware Pack (HFP) provides firmware packages for HPE Cray EX system hardware. The HFP product distribution tar.gz file includes a NOTES.txt file which lists the recommended firmware packages to install.

NOTE: Firmware packages for certain hardware components are not included in HFP and must be acquired from the hardware vendor, for example Dell and Mellanox network switch firmware.

2.3 Firmware Details

This is a list of firmware files that are part of the HPC Firmware Pack.

Firmware Group Firmware Product Current Version Updatable Via FAS
Cc Nc mtn-ccnc-firmware-1.9.5.39-20240714022451_2e62b6a8f5f0.x86_64.rpm 1.9.5-39 Yes
EX255a mtn-ex255a-bios-1.3.0-20240927215950_53e59933d03b.x86_64.rpm 1.3.0 Yes
EX254n ex254n.bios-2.0.4-20240717135525_9178f60.x86_64.rpm 2.0.4(B) Yes
EX254n.EROT EX254n.erot-1.3.136-20240815061718_636f31ad96b5.x86_64.rpm 01.03.0136.0000_n01 Yes
EX254n.VBIOS EX254n.vbios-150.172.3-20240809103617_548981ac7f56.x86_64.rpm 96.00.AC.00.03 Yes
XD224 BMC XD224.BMC-01.17.00.1001-20240814162333_15a6f1e742a6.x86_64.rpm 01.17.00.1001 Yes
XD224 SBIOS XD224.SBIOS-25.01-20240814164944_d699e78545d1.x86_64.rpm 25.01 Yes
GIGABYTE sh-svr-1264up-bios-23.08.00-20230818172157_56b85f3.x86_64.rpm C38(Semantic Version Yes
23.08.00)
sh-svr-3264-bios-23.08.00-20230818172157_56b85f3.x86_64.rpm C38(Semantic Version Yes
23.08.00)
sh-svr-5264-gpu-bios-23.08.00-20230818172157_56b85f3.x86_64.rpm C38(Semantic Version Yes
23.08.00)
EX235n mtn-ex235n-bios-1.5.1-20241007153237_a81a6e1219c2.x86_64.rpm 1.5.1 Yes
EX425 mtn-ex425-bios-1.7.6-20241004162353_6fccdd901078.x86_64.rpm 1.7.6 Yes
EX4252 mtn-ex4252-bios-2.0.0-20240927184118_1d2f956e3cbd.x86_64.rpm 2.0.0 Yes
EX235a mtn-ex235a-bios-2.0.0-20241008160020_7ed283572e72.x86_64.rpm 2.0.0 Yes
AMD MI200 AccVbios AMD_MI200-2.0.0-20220810115835_94954b0.x86_64.rpm 113-D65201-046-609321 Yes
(HFP internal/Semantic
Version 2.0.0)
AMD MI200 GPU RM MI200RM-010-3.16.0-20221205015449_d187d03.x86_64.rpm 3.16.0 Yes
EX420 mtn-ex420-bios-1.4.0-20240821163753_1700a87e3ce9.x86_64.rpm 1.4.0 Yes
GPU MI100_D3431500_100-1-0.x86_64.rpm D3431500-100 No
NIC wnc.i210-2.1.1-20240906105259_9d7e61cfbc59.x86_64.rpm p2sn01 Yes
Aruba ArubaOS-CX_6400-6300_10_13_1040.swi 10.13.1040 No
ArubaOS-CX_8320_10_13_1040.swi 10.13.1040 No
ArubaOS-CX_8325_10_13_1040.swi 10.13.1040 No
ArubaOS-CX_8360-8100_10_13_1040.swi 10.13.1040 No
Apollo 6500 FAS-BIOS-HPE_XL645d-Gen10Plus-3.20-1.x86_64.rpm 3.20_08-07-2024 Yes
A48_3.20_08_07_2024.fwpkg 3.20_08-07-2024 No
Apollo 6500 Gen10 Plus FAS-BIOS-HPE_XL675d-Gen10Plus-3.20-1.x86_64.rpm 3.20_08-07-2024 Yes
A47_3.20_08_07_2024.fwpkg 3.20_08-07-2024 No
DL325 Gen10 Plus FAS-BIOS-HPE_DL325-3.20-1.x86_64.rpm 3.20_08-07-2024 Yes
A43_3.20_08_07_2024.fwpkg 3.20_08-07-2024 No
DL385 Gen10 Plus FAS-BIOS-HPE_DL385-3.20-1.x86_64.rpm 3.20_08-07-2024 Yes
A42_3.20_08_07_2024.fwpkg 3.20_08-07-2024 No
iLO 5 iLO5_FW_LNXSC_278-firmware-ilo5-2.78-1.1.x86_64.rpm 2.78 No
FAS-HPE_ILO5-2.78-3.x86_64.rpm 2.78 Yes
iLO5_FW_LNXSC_307-firmware-ilo5-3.07-1.1.x86_64.rpm 3.07 No
FAS-HPE_ILO5-3.07-1.x86_64.rpm 3.07 Yes
DL360 Gen10 Plus, DL380 Gen10 Plus FAS-BIOS-HPE_DL360_DL380_Gen10_Plus-2.20-1.x86_64.rpm 2.20_08-07-2024 Yes
U46_2.20_08_07_2024.fwpkg 2.20_08-07-2024 No
DL360 Gen11, DL380 Gen11 FAS-BIOS-HPE_DL360_DL380_Gen11-2.22-1.x86_64.rpm 2.22_06-19-2024 Yes
U54_2.22_06_19_2024.fwpkg 2.22_06-19-2024 No
DL385 Gen11, DL365 Gen11 FAS-BIOS-HPE_DL365_DL385_Gen11-1.66-1.x86_64.rpm 1.66_07-11-2024 Yes
A55_1.66_07_11_2024.fwpkg 1.66_07-11-2024 No
DL325 Gen11, DL345 Gen11 FAS-BIOS-HPE_DL325_DL345_Gen11-1.66-1.x86_64.rpm 1.66_07-11-2024 Yes
A56_1.66_07_11_2024.fwpkg 1.66_07-11-2024 No
iLO 6 iLO6_FW_LNXSC_162-firmware-ilo6-1.62-1.1.x86_64.rpm 1.62 No
FAS-HPE_ILO6-1.62-1.x86_64.rpm 1.62 Yes
NVIDIA Acc FPGA NVIDIA.HGX.A100.4.GPU.40-2.7.3-20231030115132_d3ab9f3df485.x86_64.rpm 2.71 Yes
Redstone FW (NVIDIA 4 GPU) NVIDIA_HGX_A100_x4_SXM4_40GB_AirCooled_ 92.00.36.00.04 No
GPU-92.00.36.00.04-0.x86_64.rpm
NVIDIA_HGX_A100_x4_SXM4_40GB_LiquidCooled_ 92.00.36.00.05 No
GPU-92.00.36.00.05-0.x86_64.rpm
NVIDIA_HGX_A100_x4_SXM4_80GB_Air-Cooled_ 92.00.94.00.0A- No
GPU-92.00.94.00.0A_92.00.94.00.04.scexe 92.00.94.00.04-rev1
NVIDIA_HGX_A100_x4_SXM4_80GB_Liquid-Cooled_ 92.00.94.00.0B- No
GPU-92.00.94.00.0B_92.00.94.00.05.scexe 92.00.94.00.05-rev1
Delta FW (NVIDIA 8 GPU) NVIDIA_HGX_A100_x8_SXM4_40GB_Air-Cooled_ 92.00.45.00.03 No
GPU-92.00.45.00.03-0.x86_64.rpm
NVIDIA_HGX_A100_x8_SXM4_40GB_Liquid-Cooled_ 92.00.45.00.04 No
GPU-92.00.45.00.04-0.x86_64.rpm
NVIDIA_HGX_A100_x8_SXM4_80GB_Air-Cooled_ 92.00.9E.00.01- No
GPU-92.00.9E.00.01_92.00.9E.00.03.scex 92.00.9E.00.03-Rev1
NVIDIA_HGX_A100_x8_SXM4_80GB_Liquid-Cooled_ 92.00.45.00.06 No
GPU-92.00.45.00.06-0.x86_64.rpm

2.4 Differences from Prior Release

Some previously omitted NVIDIA firmware was updated and is now included in this version of the HFP. Firmware available in this HFP includes:

Other NVIDIA firmware is still omitted from this HFP release and includes the following components. The omitted packages include:

These components will be updated in a future release of HFP.

2.5 Performing Firmware Upgrades

HFP provides the firmware packages for HPE Cray EX systems, but HFP does not perform firmware upgrades. There are two methods to upgrade firmware: with and without the Firmware Action Service (FAS) which is only present on systems managed by CSM. The method used to install HFP depends on whether FAS is installed and operational, as described in the following subsections.

2.5.1 Updating BMC Firmware and BIOS for ncn-m001

Refer to this section in the Cray System Management Documentation, Updating BMC Firmware and BIOS for m001.

2.5.2 Upgrading Firmware With FAS

Systems managed by CSM most often perform firmware upgrades using FAS. The Install or Upgrade HPC Firmware Pack section of this document describes how to install HFP on a CSM-managed system with FAS installed and operational. The Install HPC Firmware Pack from PIT or LiveCD section of this document describes how to install HFP on a CSM-managed system booted into the Pre-Install Toolkit (PIT) or LiveCD environments (typically only the case when the system is being installed for the first time).

FAS details can be found in the Update Firmware with FAS section of the Cray System Management Documentation.

2.5.3 Upgrading Firmware Without FAS

On systems without FAS, firmware provided by HPF can be installed by following the instructions included in the directory of the HFP product distribution tar.gz file that contains the firmware package.

Each hardware product directory includes firmware packages (fwpkg, rpm, …) as well as a DOC directory with vendor documentation, including installation instructions. The following are example directory listings for HPE_XL675d-Gen10Plus (HPE ProLiant XL675d) and GB_SVR_1264UP_C17_C21 (Gigabyte 1264UP) hardware:

    HPE_XL675d-Gen10Plus/
        A47_2.40_02_23_2021.fwpkg
        DOC/
            HPCM-Firmware-Flash_v2021.03.04.pdf
            INSTALL.txt
            README.txt
        FAS-BIOS-HPE_XL675d-Gen10Plus-2.40-1-sles15sp1.x86_64.rpm
 
    GB_SVR_1264UP_C17_C21/
        DOC/
            BMCFirmwareUpdate.txt
            Gigabyte-Shasta-Firmware-Update.pdf
            README.txt
            Relnotes_MZ32-AR0-YF_C17_F01.pdf
            Relnotes_MZ32-AR0-YF_C17_Rome.pdf
            Relnotes_MZ32-AR0-YF_Naples.pdf
        sh-svr-1264up-bios-21.00.00-20210325025941_8df4708.x86_64.rpm

Focusing on the HPE_XL675d-Gen10Plus directory listing:

2.5.3.1 iLO Information

Documentation in some, but not all, of the DOC directories states that HPE Integrated Lights Out (iLO) server management software can be used to install the firmware. In those cases, the following documentation provides additional details on how to use iLO and may be of interest.

NOTE
* iLO and BIOS firmware are only provided by HFP. Download remaining drivers and firmware from the Service Pack for ProLiant (SPP) and Apollo Servers.

3 Install or Upgrade HPC Firmware Pack

This section describes how to install HFP on a CSM-managed system with FAS installed and operational.

The Install and Upgrade Framework (IUF) provides commands which install, upgrade, and deploy products on systems managed by CSM. IUF capabilities are described in detail in the IUF Section of the Cray System Management Documentation. The initial install and upgrade workflows described in the HPE Cray EX System Software Stack Installation and Upgrade Guide for CSM (S-8052) detail when and how to use IUF with a new release of HFP or other HPE Cray EX products.

Read the Overview section of this document to understand what is and is not executed as part of the HFP install process. See the Upgrading Firmware Without FAS section of this document for instructions on systems not managed by CSM.

3.1 Install and Upgrade Framework

IUF will perform the following tasks for a release of HFP.

3.2 IUF Stage Details for HFP

This section describes any HFP details that an administrator needs to be aware of before running IUF stages. Entries are prefixed with Information if no administrative action is required or Action if an administrator needs to perform tasks outside of IUF.

Information: pre-flight check stage displays versions of HFP installed, verifies if nexus, FAS, and S3 are running, displays current FAS version, displays rpms/zip files which will be loaded in the FAS

Action: If the command cray fas loader nexus create fails during the post-install-test-fas.sh check (which checks that the firmware is loaded in FAS), the user must manually run the command. The failure occurs when loaderStatus="$(cray fas loader list 2>&1 | grep loaderStatus) returns busy even after 200seconds of waiting for ready status.

Run the following command to check if the cray fas loader status is ready or busy.

ncn-mw# cray fas loader list --format json | grep loaderStatus

This will return a ready or busy status. Example: loaderStatus = "ready"

If it is busy, wait until fas loader status is ready by monitoring the status using the same command. When the cray fas loader status is ready, reinstall HFP or load the firmware directly from nexus using FAS commands. For more information, please refer to Load Firmware from Nexus.

4 Documentation for Each Firmware Unit

The documentation is used for manually installing firmware when not using FAS on HPE Cray EX.

Documentation for each firmware unit is alongside the firmware in the overall package.

├── HPE_XL675d-Gen10Plus                                           <----- Hardware type this firmware is for
 ├── A47_2.40_02_23_2021.fwpkg                                    <----- File used for manual installation
 ├── DOC                                                          <----- Documentation│ │
 │ ├── INSTALL.txt
 │ └── README.txt
 └── FAS-BIOS-HPE_XL675d-Gen10Plus-2.40-1-sles15sp1.x86_64.rpm.   <----- rpm used by FAS for update

├── GB_SVR_1264UP_C17_C21
 ├── DOC
 │ ├── BMCFirmwareUpdate.txt
 │ ├── Gigabyte-Shasta-Firmware-Update.pdf
 │ ├── README.txt
 │ ├── Relnotes_MZ32-AR0-YF_C17_F01.pdf
 │ ├── Relnotes_MZ32-AR0-YF_C17_Rome.pdf
 │ └── Relnotes_MZ32-AR0-YF_Naples.pdf
 └── sh-svr-1264up-bios-21.00.00-20210325025941_8df4708.x86_64.rpm