Optimization of Software Libraries for DFT Codes for Data-Parallel Processor Architectures
Tuesday, May 31, 2022 9:00 AM to 6:30 PM · 9 hr. 30 min. (Europe/Berlin)
Foyer 3 + H - Ground Floor
Information
Energy-efficient HPC architectures comprise highly data-parallel processor architectures like manycore CPUs with wide SIMD units, GPUs with many SMT cores, and FPGAs with large numbers of logic units. Density Functional Theory (DFT) based atomistic simulations are one of the key workloads on many HPC systems. The current modernization potential in DFT codes for atomistic simulations includes: i) the more efficient use of available on-node data-parallelism provided by state-of-the-art hardware, and ii) tackling the challenge of extending the code scalability for larger node counts.
Our project funded by the German NHR – “Nationales Hochleistungsrechnen” focuses on the widely applicable software libraries libxc, libint, and FFTXlib used in DFT codes and beyond. We approach the multi-target goal to support data-parallelism on CPUs, GPU, and additional accelerators programmable in high-level languages (FPGA) by applying a previously developed methodology and implementation of automatic code generation (see, for example, libxc). For FFTXlib, specific knowledge of the DFT domain is used to optimize 3d FFTs which is applicable in further use cases beyond DFT.
We present first results of our on-going work and demonstrate performance and scalability improvements for 3d FFTs of up to 3x for multi-node runs. For the libxc code generation, we present a newly designed and validated code generation workflow to obtain code optimized for multiple compute-device architectures.
Contributors:
Our project funded by the German NHR – “Nationales Hochleistungsrechnen” focuses on the widely applicable software libraries libxc, libint, and FFTXlib used in DFT codes and beyond. We approach the multi-target goal to support data-parallelism on CPUs, GPU, and additional accelerators programmable in high-level languages (FPGA) by applying a previously developed methodology and implementation of automatic code generation (see, for example, libxc). For FFTXlib, specific knowledge of the DFT domain is used to optimize 3d FFTs which is applicable in further use cases beyond DFT.
We present first results of our on-going work and demonstrate performance and scalability improvements for 3d FFTs of up to 3x for multi-node runs. For the libxc code generation, we present a newly designed and validated code generation workflow to obtain code optimized for multiple compute-device architectures.
Contributors:
- Thomas Steinke (Zuse Institute Berlin (ZIB))
- Thomas D. Kühne (University of Paderborn)
- Bernd Meyer (Friedrich-Alexander-Universität Erlangen-Nürnberg)
Format
On-site