Determining Parallel Application Execution Efficiency and Scaling using the POP Methodology

Determining Parallel Application Execution Efficiency and Scaling using the POP Methodology

Sunday, May 12, 2024 9:00 AM to 1:00 PM · 4 hr. (Europe/Berlin)
Hall Y12 - 2nd floor
Tutorial
Heterogeneous System ArchitecturesOptimizing for Energy and PerformancePerformance and Resource ModelingPerformance MeasurementPerformance Tools and Simulators

Information

HPC application developers encounter significant challenges getting their codes to run correctly on leadership computer systems consisting of large numbers of interconnected multi-socket multicore processor nodes often with attached accelerator devices. They also need effective tools and methods to track and assess their codes’ execution performance as they aim to get ready for production on current or prospective exascale computer systems. This tutorial presents the methodology developed and applied over several years within the EU/EuroHPC Centre of Excellence Performance Optimisation and Productivity (POP). Its focus is the hierarchy of execution efficiency and scaling metrics that identify the most critical issues and quantify potential benefits of remedies. The metrics can be readily compared and determined by a variety of tools for applications in any language employing standard MPI, OpenMP/OpenACC and other multi-threading and offload paradigms. Using their own notebook computers tutorial participants will follow exercises using widely-deployed open-source tools and provided performance measurements of actual HPC application executions (ranging from CFD to neuroscience), preparing them to locate and diagnose efficiency and scalability issues in their own parallel application codes.
Format
On-site
Targeted Audience
Application developers striving for best application performance on current HPC systems, particularly those preparing for imminent exascale computer systems; HPC support staff who assist application developers with performance tuning; Others interested in performance tools and application tuning.
Beginner Level
50%
Intermediate Level
50%
Prerequisites
The level of the presentation and particularly the hands-on exercises require a general understanding of HPC applications using MPI and/or OpenMP (or some other form of multi-threading, tasking or offload of kernels) particularly mixed-mode. It is not expected that attendees are already familiar with the featured performance analysis tools, in particular their associated instrumentation and measurement collection tools, however, this is certainly advantageous and highly recommended. Participants who wish to follow the hands-on exercises are expected to bring their own notebook computers. Only a standard web-browser is required, however, packages are available for Linux, macOS and Windows for those who wish to install the graphical tools. The example profile and event trace measurements for interactive exploratory analysis will be provided on USB memory sticks and for download.