Integration of Deep Learning APIs into existing performance analysis tools
Tuesday, May 31, 2022 9:00 AM to 6:30 PM · 9 hr. 30 min. (Europe/Berlin)
Foyer 3 + H - Ground Floor
Information
Deep Learning APIs such as TensorFlow or PyTorch usually do not guarantee the efficient use of HPC resources, since these APIs were not originally developed for HPC systems. With existing HPC tools, efficiency analysis of Deep Learning applications is more difficult than for classical HPC applications. HPC tools are usually designed for programming paradigms such as MPI, OpenMP, CUDA. Although these analysis tools also work in principle with DeepLearning applications, they require a different evaluation and interpretation of the data. In particular, backtracking of performance characteristics to the underlying model is usually missing.
This project intends to close this gap and combine established API combinations such as "Keras via Tensorflow" or "Tensorflow via Horovod" with existing analysis solutions from HPC (for example Score-P or Likwid). The focus is not on the development of software tools, but on making the existing software infrastructure more efficiently available and usable for machine learning users across platforms. The primary goal of this project is to formulate and communicate best practices for the efficient use of Deep Learning scenarios on high-performance NHR computers. To this end, the basic findings of this project will be processed and presented to users in two NHR-wide training course in Q2/2022 and Q4/2022.
In the first year of the project, deep learning APIs such as TensorFlow or PyTorch have been evaluated with performance APIs such as Score-P and NVTX. A first prototypical benchmark, provided by ZIB, and DeepSpeech served as application under test.
Contributors:
This project intends to close this gap and combine established API combinations such as "Keras via Tensorflow" or "Tensorflow via Horovod" with existing analysis solutions from HPC (for example Score-P or Likwid). The focus is not on the development of software tools, but on making the existing software infrastructure more efficiently available and usable for machine learning users across platforms. The primary goal of this project is to formulate and communicate best practices for the efficient use of Deep Learning scenarios on high-performance NHR computers. To this end, the basic findings of this project will be processed and presented to users in two NHR-wide training course in Q2/2022 and Q4/2022.
In the first year of the project, deep learning APIs such as TensorFlow or PyTorch have been evaluated with performance APIs such as Score-P and NVTX. A first prototypical benchmark, provided by ZIB, and DeepSpeech served as application under test.
Contributors:
- Holger Brunst (Technische Universität Dresden, ZIH)
- Thomas Steinke (Zuse Institute Berlin (ZIB))
- Sebastian Döbel (Technische Universität Dresden, ZIH)
Format
On-site