Steering Customized AI Architectures for HPC Scientific Applications

Steering Customized AI Architectures for HPC Scientific Applications

Tuesday, May 23, 2023 11:35 AM to 12:00 PM · 25 min. (Europe/Berlin)
Hall F - 2nd Floor
Research Paper
Climate and Weather ModelingEmerging HPC Processors and AcceleratorsNumerical LibrariesParallel Programming LanguagesPerformance Modeling and Tuning

Information

AI hardware technologies have revolutionized computational science. While they have been mostly used to accelerate deep learning training and inference models for machine learning, HPC scientific applications do not seem to directly benefit from these specific hardware features unless AI-based components are introduced into their simulation workflows, for instance, as a replacement of their numerical solvers. This paper proposes to take another direction in an attempt to democratize customized AI architectures for HPC scientific computing. The main idea consists in demonstrating how legacy applications can leverage these AI engines after a necessary algorithmic redesign. It is critical that the resulting software implementations map onto the underlying memory-austere hardware architectures to extract the expected performance. To facilitate this process, we promote the matricization technique for restructuring codes (1) by exploiting data sparsity via algebraic compression and (2) by expressing the critical computational phases in terms of tile low-rank matrix-vector multiplications (TLR-MVM) and batch matrix-matrix multiplications (batch GEMM). Algebraic compression enables to reduce memory footprint and to fit into small local cache/memory, while batch execution ensures high occupancy. We highlight how we can steer the Graphcore AI-focused Wafer-on-Wafer Intelligence Processing Unit (IPU) to deliver high performance for both operations. We conduct a performance benchmarking campaign of these two matrix operations that account for most of the elapsed times of four surrogate applications in computational astronomy, seismic imaging, wireless communications, and climate/weather predictions. We report bandwidth and execution rates with speedup factors up to 150X/14X/25X/40X, respectively, on Graphcore IPU compared to other systems.
Contributors:
Format
On-siteOn Demand
Beginner Level
20%
Intermediate Level
60%
Advanced Level
20%