Contributors:
Abstract: High Level Synthesis (HLS) decreases difficulties in developing FPGA
hardware, which enables us to use software languages such as C, C++,
and OpenCL for creating FPGA hardware logic. HLS also enables HPC
application developers to implement their applications on FPGA
systems. However, the memory bandwidth of an FPGA is lower than other
accelerators used in the HPC area such as GPU. An FPGA board for HPC
with DDR4 memory had only 76.8GB/s memory bandwidth at
maximum. Recently, an FPGA chip with High bandwidth memory 2 (HBM2) is
available with a 3D-stacked memory structure and many channels
aggregated to obtain high bandwidth. It has up to 512GB/s of memory
bandwidth in the latest Intel Stratix10. Comparing to GPU, it is still
around a quarter of GPU but this ratio is much better than before. In
this poster, we evaluate the performance of HBM2 in an Intel Stratix
10 MX FPGA. We implement a tester module that supports not only
sequential access but also stride access which is widely used in HPC
applications. In addition to the performance evaluation, we discuss
how to utilize a lot of memory channels from HBM2 and propose our
memory subsystem for HBM2 and HPC applications in an FPGA.