

Performance portability on HPC accelerator architectures with modern techniques: The ParFlow blueprint
Thursday, July 1, 2021 12:45 PM to 1:00 PM · 15 min. (Africa/Abidjan)
Stream#1
Information
Contributors:
Abstract:
Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Implementational techniques that simultaneously result in good performance and developer productivity while keeping the codebase adaptable and well maintainable in the long-term are of high importance. ParFlow, a widely used hydrologic model based on C, achieves these attributes by using Unified Memory with a pool allocator and hiding the architecture-dependent code in preprocessor macros (ParFlow eDSL). The implementation can leverage either a native CUDA implementation or a Kokkos library and results in very good weak scaling with up to 26x speedup from the NVIDIA A100 GPUs over hundreds of nodes on the new Juwels Booster system at Jülich Supercomputing Centre.
Abstract:
Rapidly changing heterogeneous supercomputer architectures pose a great challenge to many scientific communities trying to leverage the latest technology in high-performance computing. Implementational techniques that simultaneously result in good performance and developer productivity while keeping the codebase adaptable and well maintainable in the long-term are of high importance. ParFlow, a widely used hydrologic model based on C, achieves these attributes by using Unified Memory with a pool allocator and hiding the architecture-dependent code in preprocessor macros (ParFlow eDSL). The implementation can leverage either a native CUDA implementation or a Kokkos library and results in very good weak scaling with up to 26x speedup from the NVIDIA A100 GPUs over hundreds of nodes on the new Juwels Booster system at Jülich Supercomputing Centre.