Under the Hood of SYCL - An Initial Performance Analysis With an Unstructured-mesh CFD Application
Monday, June 28, 2021 2:05 PM to 2:25 PM · 20 min. (Africa/Abidjan)
Stream#4
Information
Contributors:
Abstract:
As the computing hardware landscape gets more diverse, and the complexity of hardware grows, the need for a general purpose parallel programming model capable of developing (performance) portable codes have become highly attractive. Intel's OneAPI suite, which is based on the SYCL standard, itself based on OpenCL, aims to fill this gap using a modern C++ API. In this paper, we use SYCL to parallelize MG-CFD, an unstructured-mesh computational fluid dynamics (CFD) code, to explore current performance of SYCL. The code is benchmarked on a number of modern processor systems from Intel (including CPUs and the latest Xe GPU), AMD, ARM and Nvidia, making use of a variety of current SYCL compilers, with a particular focus on OneAPI, and how it maps it Intel's CPU and GPU architectures. We compare with other programming models, including SIMD, OpenMP, MPI and CUDA. The results are mixed; the performance of this class of applications, when parallelized with SYCL, highly depends on the target architecture and the compiler, but in many cases comes close to the currently prevalent parallel programming models. However, as with OpenCL, it still requires different parallelization strategies or code-paths be written for different hardware to obtain the best performance.
- Stephen A. Jarvis (University of Birmingham)
- Gihan Mudalige (University of Warwick)
- Andrew M. B. Owenson (University of Warwick)
- Archie Powell (University of Warwick)
- Istvan Zoltan (Pazmany Peter Catholic University)
Abstract:
As the computing hardware landscape gets more diverse, and the complexity of hardware grows, the need for a general purpose parallel programming model capable of developing (performance) portable codes have become highly attractive. Intel's OneAPI suite, which is based on the SYCL standard, itself based on OpenCL, aims to fill this gap using a modern C++ API. In this paper, we use SYCL to parallelize MG-CFD, an unstructured-mesh computational fluid dynamics (CFD) code, to explore current performance of SYCL. The code is benchmarked on a number of modern processor systems from Intel (including CPUs and the latest Xe GPU), AMD, ARM and Nvidia, making use of a variety of current SYCL compilers, with a particular focus on OneAPI, and how it maps it Intel's CPU and GPU architectures. We compare with other programming models, including SIMD, OpenMP, MPI and CUDA. The results are mixed; the performance of this class of applications, when parallelized with SYCL, highly depends on the target architecture and the compiler, but in many cases comes close to the currently prevalent parallel programming models. However, as with OpenCL, it still requires different parallelization strategies or code-paths be written for different hardware to obtain the best performance.