Energy Management and Optimization with EAR
Sunday, May 12, 2024 2:00 PM to 6:00 PM · 4 hr. (Europe/Berlin)
Hall Y12 - 2nd floor
Tutorial
Energy Management
Information
This tutorial will present the main features of the Energy Aware Runtime (EAR) software, specifically targeted for users of high performance computing data-centers that use both HPC and AI applications.. EAR is the energy management software, which is installed in several European Data Centers such as SuperMUC-NG (LRZ), Snellius (SURF) and now in MN5 (BSC). EAR offers services for system energy monitoring (including system and node powercap), job-level energy and performance monitoring, and energy optimization through a runtime library. Examples of common HPC/AI applications will be given in order to show users how to do job-level energy accounting, energy and performance monitoring, application characterization, and finally energy optimization of their application..
Specifically we will include scientific applications that represent common CPU intensive, memory intensive, and GPU use cases. We will highlight how a user can monitor the energy usage of these applications using EAR with the inclusion of simple Slurm environment variables. During the tutorial we will show users how to correlate application metrics with energy results, helping users to understand their applications. Some examples of EAR data visualization will be done using Graphana and EAR job analytics tools.
Format
On-site
Targeted Audience
The tutorial targets hpc users and application developpers interested in energy efficiency. It is not required previous experience on EAR software. The tutorial will use basic and advaned applications (alredy installed) in SURF data center and will show how to get benefit from EAR software for energy optimization.
Beginner Level
100%
Prerequisites
Users will bring their own laptops to connect to Snellius where EAR is installed. Already installed applications will be used but we will consider running user installed applications if there is enough time. Sime exercises will be done by attendees but some others (such as manually changing the CPU frequency) are restricted to potential cluster limitations.