Lifting energy-efficiency in supercomputing to the next level

Wednesday, June 30, 2021 1:45 PM to 2:00 PM · 15 min. (Africa/Abidjan)
Next Generation HPC ComponentsAcceleratorsHPC's Role in the Energy RevolutionExtreme-Scale ParallelismHPC System Architecture

Information

At the Leibniz Supercomputing Centre (LRZ) our IT systems consume up to five megawatts of electricity at peak times, up to four of which are used to power our tier-0 flagship HPC system SuperMUC-NG. To ensure that the energy is used as efficiently as possible and to further reduce energy requirements, we have implemented a 4-pillar concept in our global energy optimization strategy. For more than a decade now, our technology partners – IBM initially and now Lenovo – and us have been dedicating ourselves to research and development activities to continuously lift up our improvements to building infrastructure, HPC system software and hardware as well as HPC application optimization to the next level. At the core of this strategy is our warm-water cooling approach. We have installed several iterations since 2012, including the first Lenovo Neptune® branded systems in 2018.

In the talk we will describe our energy optimization strategy and how we have been implementing it together with Lenovo in the past years. Next to warm-water cooling this includes e.g. Lenovo’s EAR (Energy Aware Runtime software) as well as our open source monitoring tool, the Data Centre Data Base (DCDB). We will also give a preview on our upcoming system, SuperMUC-NG Phase 2, a system designed for AI applications based on Intel Xeon Scalable processors (codenamed Sapphire Rapids) and Intel’s upcoming GPU “Ponte Vecchio”, as well as a distributed asynchronous object storage (DAOS), leveraging 3rd Gen Intel Icelake processors integrated into Lenovo's SD650-I v3.