Cross-layer Visualization of Network Communication for HPC Clusters
Wednesday, June 1, 2022 1:44 PM to 1:48 PM · 4 min. (Europe/Berlin)
Hall D - 2nd Floor
HPC Workflows
Information
Understanding and visualizing the full-stack performance trade-offs and interplay
between I/O filesystem, MPI libraries, the communication
fabric, and job scheduler is a challenging
endeavor. Designing a holistic profiling and visualization method for
HPC communication networks is challenging since different levels of communication coexist and
interact with each other on the communication fabric. A breakdown of traffic is essential to understand the interplay of different layers along with the application's communication behavior without losing a general view of network traffic.
Unfortunately, existing profiling tools are disjoint and either focus on only visualizing a few levels of the HPC stack, which limits the insights they can provide, or they provide extremely detailed information which necessitates a steep learning curve to understand. The broad challenge becomes that How can we improve visualization methods to enable holistic insight for representing the cross-stack metrics of HPC communication stack including I/O filesystem, network fabric, and MPI communication counters along with job scheduler and fabric topology?
In this poster, we propose, implement, and compare our visualization designs to enable holistic insight for representing the cross-stack metrics generated by INAM. We demonstrate novel benefits of our cross-stack communication analysis in real-time to detect bottlenecks and understand communication performance. Then, evaluate the performance of our visualization designs by scaling to larger clusters of 1700 nodes. Finally, we provide take-sways and elaborate trade-offs involved in using visualization designs. One of our visualization designs has been publicly released for free and is available for the public.
Contributors:
Unfortunately, existing profiling tools are disjoint and either focus on only visualizing a few levels of the HPC stack, which limits the insights they can provide, or they provide extremely detailed information which necessitates a steep learning curve to understand. The broad challenge becomes that How can we improve visualization methods to enable holistic insight for representing the cross-stack metrics of HPC communication stack including I/O filesystem, network fabric, and MPI communication counters along with job scheduler and fabric topology?
In this poster, we propose, implement, and compare our visualization designs to enable holistic insight for representing the cross-stack metrics generated by INAM. We demonstrate novel benefits of our cross-stack communication analysis in real-time to detect bottlenecks and understand communication performance. Then, evaluate the performance of our visualization designs by scaling to larger clusters of 1700 nodes. Finally, we provide take-sways and elaborate trade-offs involved in using visualization designs. One of our visualization designs has been publicly released for free and is available for the public.
Contributors:
- Pouya Kousha (The Ohio State University)
- Dhabaleswar K. Panda (The Ohio State University)
Format
On-site