STREAM Benchmark on Cerebras WSE-2
Monday, May 13, 2024 3:00 PM to Wednesday, May 15, 2024 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Research Poster
Emerging Computing TechnologiesOptimizing for Energy and PerformancePerformance Measurement
Information
Poster is on display and will be presented at the poster pitch session.
Although the performance of recent computers has improved, the ratio between memory bandwidth and computing performance (Byte/Flops value) continues to decrease. Increasing power consumption is also an obstacle to performance improvement. The Cerebras CS-2 system is an accelerator for deep learning. It has the Wafer Scale Engine-2 (WSE-2), the world’s largest chip with 749,715 homogeneous processing elements (PE). WSE-2 provides a high B/F value and power efficiency. We have been investigating the feasibility of WSE-2 for computational science. This poster presents an evaluation of the computing performance, memory bandwidth, and power consumption of the entire WSE-2 with the STREAM benchmark. The maximum computing performance is 3.45/1.26 Pflop/s for TRIAD half/single-precision floating-point operations, respectively. The maximum memory bandwidth is 8.85/8.81 PB/s for TRIAD half/single-precision floating-point operations.
Contributors:
Although the performance of recent computers has improved, the ratio between memory bandwidth and computing performance (Byte/Flops value) continues to decrease. Increasing power consumption is also an obstacle to performance improvement. The Cerebras CS-2 system is an accelerator for deep learning. It has the Wafer Scale Engine-2 (WSE-2), the world’s largest chip with 749,715 homogeneous processing elements (PE). WSE-2 provides a high B/F value and power efficiency. We have been investigating the feasibility of WSE-2 for computational science. This poster presents an evaluation of the computing performance, memory bandwidth, and power consumption of the entire WSE-2 with the STREAM benchmark. The maximum computing performance is 3.45/1.26 Pflop/s for TRIAD half/single-precision floating-point operations, respectively. The maximum memory bandwidth is 8.85/8.81 PB/s for TRIAD half/single-precision floating-point operations.
Contributors:
Format
On-site