High-Performance and Smart Networking Technologies for HPC and AI
Sunday, May 12, 2024 9:00 AM to 1:00 PM · 4 hr. (Europe/Berlin)
Hall Y2 - 2nd floor
Tutorial
Emerging Computing TechnologiesExtreme-scale SystemsHeterogeneous System ArchitecturesInterconnects and Networks
Information
High-Performance Networking technologies are generating a
lot of excitement towards building next generation High-End
Computing (HEC) systems for HPC and AI with GPGPUs,
accelerators, and Data Center Processing Units (DPUs), and a
variety of application workloads. This tutorial will provide
an overview of these emerging technologies, their
architectural features, current market standing, and
suitability for designing HEC systems. It will start with a
brief overview of IB, HSE, RoCE, and Omni-Path interconnect.
An in-depth overview of the architectural features of these
interconnects will be presented with associated hands-on
exercises. It will be followed with an overview of the
emerging NVLink, NVLink2, NVSwitch, EFA, and Slingshot
architectures. We will then present advanced features of
commodity high-performance networks that enable performance
and scalability. We will then provide an overview of
enhanced offload capable network adapters like DPUs/IPUs
(Smart NICs), their capabilities and features. Next, an
overview of software stacks for high-performance networks
like Open Fabrics Verbs, LibFabrics, and UCX comparing the
performance of these stacks will be given. Next, challenges
in designing MPI library for these interconnects, solutions
and sample performance numbers will be presented.
Format
On-site
Targeted Audience
This tutorial is targeted at various categories of people (newcomers, managers, administrators) working in the areas of high-performance communication and I/O, storage, networking, middleware, virtualization, cloud computing, deep learning, big data, and applications related to high-end systems.
Beginner Level
60%
Intermediate Level
40%
Prerequisites
There is no fixed pre-requisite. As long as the attendee has a general
knowledge in high performance computing, networking, storage, and
related issues, he/she will be able to understand and appreciate
it. The tutorial is designed in such a way that an attendee gets
exposed to the topics in a smooth and progressive manner.
The attendees should have a laptop/tablet to login to a remote HPC
system for the hands-on portion of the tutorial.