NHR Project: Standards and interfaces for system-wide job-specific performance monitoring

NHR Project: Standards and interfaces for system-wide job-specific performance monitoring

Tuesday, May 31, 2022 9:00 AM to 6:30 PM · 9 hr. 30 min. (Europe/Berlin)
Foyer 3 + H - Ground Floor
HPC Workflows

Information

A system-wide, continuous job-specific hardware performance monitoring which provides reliable and relevant utilization metrics (such as main memory bandwidth, FLOP-rates, instruction throughputs, vectorization ratios, IO-rates or communication frequencies/volumes) is a foundation of any PE-oriented user support in academic HPC computing centers. This joint National High Performance Computing Alliance (NHR) funded project targets the formulation of standards and interfaces for a job-specific performance monitoring infrastructure. This covers data formats, application programming interfaces, and guidelines and best practices for UI presentation and job classification. Two job-specific monitoring frameworks (PIKA and ClusterCockpit) implement and test the developed standards. All eight NHR centers participate in the meetings, with two centers receiving project funding and two more centers provide resources to the project. The project is open to external partners, many centers from HPC.NRW as well as colleagues from GSC centers are regularly joining the monthly meetings. The project is approved on a annual basis by the NHR alliance.
Contributors:

  • Jan Eitzinger (NHR@FAU)
  • Jan Eitzinger (NHR@FAU)
  • Frank Winkler (Center for Information Services and High Performance Computing (ZIH))
Format
On-site