Optimization of Heterogeneous Load Balancing - Cooperative Utilization of SIMD CPUs and GPUs for Lattice Boltzmann Methods in OpenLB
Monday, May 13, 2024 3:00 PM to Wednesday, May 15, 2024 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Research Poster
Chemistry and Materials ScienceComputational PhysicsHeterogeneous System ArchitecturesNovel AlgorithmsOptimizing for Energy and Performance
Information
Poster is on display and will be presented at the poster pitch session.
Lattice Boltzmann Methods (LBM) are particularly suited to highly parallel computational fluid dynamics simulations both on SIMD CPUs and GPUs. While heterogeneous systems combining CPUs and GPUs are ubiquitous in high performance computation (HPC), the computationally dominant collide-and-stream loop commonly only utilizes either CPUs or GPUs homogeneously. This poster summarizes a novel approach utilizing genetic programming for cost-aware optimization of spatial domain decompositions targeting heterogeneous execution environments. The implementation and performance of the genetic algorithm for spatial decomposition, as well as the subsequently derived rank assignment approaches, are showcased. The resulting comprehensive load balancing strategy is implemented in the open source LBM framework OpenLB and applied to turbulent flow reference cases, including a multi-physics reactive mixer benchmark. Evaluation of its computational performance on heterogeneous HPC nodes yields speedups up to 87% compared to homogeneous GPU-only execution.
Contributors:
Lattice Boltzmann Methods (LBM) are particularly suited to highly parallel computational fluid dynamics simulations both on SIMD CPUs and GPUs. While heterogeneous systems combining CPUs and GPUs are ubiquitous in high performance computation (HPC), the computationally dominant collide-and-stream loop commonly only utilizes either CPUs or GPUs homogeneously. This poster summarizes a novel approach utilizing genetic programming for cost-aware optimization of spatial domain decompositions targeting heterogeneous execution environments. The implementation and performance of the genetic algorithm for spatial decomposition, as well as the subsequently derived rank assignment approaches, are showcased. The resulting comprehensive load balancing strategy is implemented in the open source LBM framework OpenLB and applied to turbulent flow reference cases, including a multi-physics reactive mixer benchmark. Evaluation of its computational performance on heterogeneous HPC nodes yields speedups up to 87% compared to homogeneous GPU-only execution.
Contributors:
Format
On-site