High-Performance and Smart Networking Technologies for HPC and AI

High-Performance and Smart Networking Technologies for HPC and AI

Sunday, May 12, 2024 9:00 AM to 1:00 PM · 4 hr. (Europe/Berlin)
Hall Y2 - 2nd floor
Tutorial
Emerging Computing TechnologiesExtreme-scale SystemsHeterogeneous System ArchitecturesInterconnects and Networks

Information

High-Performance Networking technologies are generating a lot of excitement towards building next generation High-End Computing (HEC) systems for HPC and AI with GPGPUs, accelerators, and Data Center Processing Units (DPUs), and a variety of application workloads. This tutorial will provide an overview of these emerging technologies, their architectural features, current market standing, and suitability for designing HEC systems. It will start with a brief overview of IB, HSE, RoCE, and Omni-Path interconnect. An in-depth overview of the architectural features of these interconnects will be presented with associated hands-on exercises. It will be followed with an overview of the emerging NVLink, NVLink2, NVSwitch, EFA, and Slingshot architectures. We will then present advanced features of commodity high-performance networks that enable performance and scalability. We will then provide an overview of enhanced offload capable network adapters like DPUs/IPUs (Smart NICs), their capabilities and features. Next, an overview of software stacks for high-performance networks like Open Fabrics Verbs, LibFabrics, and UCX comparing the performance of these stacks will be given. Next, challenges in designing MPI library for these interconnects, solutions and sample performance numbers will be presented.
Format
On-site
Targeted Audience
This tutorial is targeted at various categories of people (newcomers, managers, administrators) working in the areas of high-performance communication and I/O, storage, networking, middleware, virtualization, cloud computing, deep learning, big data, and applications related to high-end systems.
Beginner Level
60%
Intermediate Level
40%
Prerequisites
There is no fixed pre-requisite. As long as the attendee has a general knowledge in high performance computing, networking, storage, and related issues, he/she will be able to understand and appreciate it. The tutorial is designed in such a way that an attendee gets exposed to the topics in a smooth and progressive manner. The attendees should have a laptop/tablet to login to a remote HPC system for the hands-on portion of the tutorial.

Log in