NHR Project: Standards and interfaces for system-wide job-specific performance monitoring
Tuesday, May 31, 2022 9:00 AM to 6:30 PM · 9 hr. 30 min. (Europe/Berlin)
Foyer 3 + H - Ground Floor
HPC Workflows
Information
A system-wide, continuous job-specific hardware performance monitoring which
provides reliable and relevant utilization metrics (such as main memory
bandwidth, FLOP-rates, instruction throughputs, vectorization ratios, IO-rates
or communication frequencies/volumes) is a foundation of any PE-oriented user
support in academic HPC computing centers. This joint National High Performance
Computing Alliance (NHR) funded project targets the formulation of standards
and interfaces for a job-specific performance monitoring infrastructure. This
covers data formats, application programming interfaces, and guidelines and best practices for UI
presentation and job classification. Two job-specific monitoring frameworks (PIKA
and ClusterCockpit) implement and test the developed standards. All eight NHR
centers participate in the meetings, with two centers receiving project funding
and two more centers provide resources to the project. The project is open to
external partners, many centers from HPC.NRW as well as colleagues from GSC
centers are regularly joining the monthly meetings. The project is approved on
a annual basis by the NHR alliance.
Contributors:
Contributors:
- Jan Eitzinger (NHR@FAU)
- Jan Eitzinger (NHR@FAU)
- Frank Winkler (Center for Information Services and High Performance Computing (ZIH))
Format
On-site