Performance Evaluation of HBM2 on an Intel Stratix 10 MX device for HPC Applications

HPC System Architecture



High Level Synthesis (HLS) decreases difficulties in developing FPGA hardware, which enables us to use software languages such as C, C++, and OpenCL for creating FPGA hardware logic. HLS also enables HPC application developers to implement their applications on FPGA systems. However, the memory bandwidth of an FPGA is lower than other accelerators used in the HPC area such as GPU. An FPGA board for HPC with DDR4 memory had only 76.8GB/s memory bandwidth at maximum. Recently, an FPGA chip with High bandwidth memory 2 (HBM2) is available with a 3D-stacked memory structure and many channels aggregated to obtain high bandwidth. It has up to 512GB/s of memory bandwidth in the latest Intel Stratix10. Comparing to GPU, it is still around a quarter of GPU but this ratio is much better than before. In this poster, we evaluate the performance of HBM2 in an Intel Stratix 10 MX FPGA. We implement a tester module that supports not only sequential access but also stride access which is widely used in HPC applications. In addition to the performance evaluation, we discuss how to utilize a lot of memory channels from HBM2 and propose our memory subsystem for HBM2 and HPC applications in an FPGA.