Composable Infrastructure: Lessons Learned by SDSC
Wednesday, June 1, 2022 1:00 PM to 2:00 PM · 1 hr. (Europe/Berlin)
Hall E - 2nd Floor
HPC Workflows
Information
In this BoF, we will explore why some research centers are moving to a composable architecture. San Diego Supercomputer Center (SDSC) will share its experience with distributed computing architectures and the outcomes experienced from implementing a composable, heterogeneous system. The Bright Computing division of NVIDIA (Bright), a global leader in automation and management software for edge-to-core-to-cloud high-performance computing, will provide a brief technical demo. And GigaIO, creator of next-generation data center rack-scale architecture for AI and HPC solutions, will introduce the concept of composable heterogeneous computing environments, laying out the current challenges and available solutions.
The need to leverage new types of processors, accelerators, and AI algorithms on servers from different manufacturers, integrate with the cloud, extend to the edge, and host machine learning and data analytics applications requires a more flexible approach to the delivery of HPC resources. A paradigm shift of this magnitude reveals several challenges to automating a composable heterogeneous computing solution.
Because composable nodes do not exist as an inventory of physical nodes, some workload management systems will not even allow jobs that require them to be submitted, while others will allow them to be submitted, but they will never actually run. Another challenge is how to prevent workload managers from starving composable nodes to the extent that new nodes cannot be composed.
This topic is highly relevant to HPC as it stands to increase resource utilization, leverage heterogeneity, enable resource sharing, boost system performance, reduce time to results, and drive down OpEx and CapEx expenditures.
Contributors:
The need to leverage new types of processors, accelerators, and AI algorithms on servers from different manufacturers, integrate with the cloud, extend to the edge, and host machine learning and data analytics applications requires a more flexible approach to the delivery of HPC resources. A paradigm shift of this magnitude reveals several challenges to automating a composable heterogeneous computing solution.
Because composable nodes do not exist as an inventory of physical nodes, some workload management systems will not even allow jobs that require them to be submitted, while others will allow them to be submitted, but they will never actually run. Another challenge is how to prevent workload managers from starving composable nodes to the extent that new nodes cannot be composed.
This topic is highly relevant to HPC as it stands to increase resource utilization, leverage heterogeneity, enable resource sharing, boost system performance, reduce time to results, and drive down OpEx and CapEx expenditures.
Contributors:
- Frank Würthwein (San Diego Supercomputer Center)
- Martijn de Vries (NVIDIA)
- James Cuff (GigaIO)
Format
On-site