Poster is on display and will be presented at the poster pitch session.
Although the performance of recent computers has improved, the ratio between memory bandwidth and computing performance (Byte/Flops value) continues to decrease. Increasing power consumption is also an obstacle to performance improvement. The Cerebras CS-2 system is an accelerator for deep learning. It has the Wafer Scale Engine-2 (WSE-2), the world’s largest chip with 749,715 homogeneous processing elements (PE). WSE-2 provides a high B/F value and power efficiency. We have been investigating the feasibility of WSE-2 for computational science. This poster presents an evaluation of the computing performance, memory bandwidth, and power consumption of the entire WSE-2 with the STREAM benchmark. The maximum computing performance is 3.45/1.26 Pflop/s for TRIAD half/single-precision floating-point operations, respectively. The maximum memory bandwidth is 8.85/8.81 PB/s for TRIAD half/single-precision floating-point operations.
Contributors: