Omni-Path and Ultra Ethernet: A Convergence Solution for HPC and AI Networking

Tuesday, May 14, 2024 4:00 PM to 4:20 PM · 20 min. (Europe/Berlin)

Hall H, Booth L01

HPC Solutions Forum

AI Applications powered by HPC TechnologiesEmerging Computing TechnologiesInterconnects and Networks

Information

The data requirements of High Performance Computing (HPC) and Artificial Intelligence (AI) are stressing traditional data center networks. HPC workloads generate diverse communication patterns, while AI utilizes high-bandwidth collectives benefitting from short tail latency. The HPC and AI workloads are strongly supported by specialized fabrics like Omni-Path, but present challenges for Ethernet networks not designed for modern communication patterns. Efficiency in HPC and AI can be improved through various methods. These include optimizing algorithms, enhancing parallelism, and improving data locality. Co-development of advanced hardware features and lightweight, scalable middleware is crucial. Omni-Path, a dedicated high-performance network technology, has demonstrated efficacy in managing both HPC and AI. Ultra Ethernet has emerged as a promising solution, aiming to reconcile the widespread use of Ethernet with the robust HPC/AI performance provided by Omni-Path. By offering higher data rates, reduced latency, and improved congestion control, Ultra Ethernet could efficiently support both HPC and AI workloads within a mixed Ethernet infrastructure. In the realm of high-performance networking, several advancements in high speed fixed function logic and signaling, programmable offloads, and CXL composability are important considerations in a network design that optimizes the widest possible application characteristics. Ultra Ethernet compatible Omni-Path will continue to advance the state of the art in network management, resource allocation, and workload scheduling for HPC, and advancements pioneered by Omni-Path will be crucial for Ultra Ethernet’s success.

HPC Solutions Forum Questions

Are technologies and configurations for AI and HPC converging or diverging? Is it possible to serve both adequately and efficiently in the same environment?What can be done to make HPC and AI more efficient?What is the biggest pending advancement in high-performance networking: DPUs, composability, or something else?

Format

On-site

Cornelis NetworksD10Cornelis Networks is a technology leader delivering purpose-built, high-performance fabrics accelerating High Performance Computing, High Performance Data Analytics, and Artificial Intelligence workloads in the Cloud and in the Data Center. The company’s products enable scientific, academic, governmental, and commercial customers to solve some of the world’s toughest challenges by efficiently focusing the computational power of many processing devices at scale on a single problem, simultaneously improving both result accuracy and time-to-solution for their most complex application workloads. Cornelis Networks delivers its end-to-end interconnect solutions worldwide through an established set of server OEM and channel partners.

Speakers

Charles Archer

Chief Technology OfficerCornelis Networks

Registered attendees

Abhishek Sharma

Chief EngineerNorwegian Meteorological Institute

Alexander Esser

IT specialistRuhr-Universität Bochum

Alexander Sinn

StudentN/A