

User Support for Full-System Execution on the Supercomputer Fugaku
Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)
Foyer D-G - 2nd Floor
Project Poster
Community EngagementCompiler and Tools for Parallel ProgrammingExtreme-scale SystemsResource Management and SchedulingRuntime Systems for HPC
Information
Poster is on display.
This poster outlines RIST's user--support activities for full-system executions on the supercomputer Fugaku, a 158,976-node Arm-based system. Full-system executions allocate nearly the entire machine to a single project and are conducted only twice a year. Over the last two years, RIST has supported eight awarded projects, providing guidance from application preparation to performance tuning.
The call for full-node-scale simulations includes strict requirements, such as prior completion of a half-size full-node run and eligibility for future Gordon Bell Prize submission. Successful execution requires extremely high parallelism, hybrid MPI/OpenMP programming, balanced loads, optimized MPI communication, distributed I/O, uniform memory usage, and robust runtime monitoring. Among these, MPI communication optimization emerged as one of the most critical challenges at extreme scale.
To mitigate communication bottlenecks on Fugaku’s Tofu-D interconnect, RIST developed the FugakuNodeMappingTools, which automatically generate optimized rank-mapping files. The tool arranges processes to confine intensive communication within local 2x2x1 Tofu-unit blocks and minimizes hop counts along the Z dimension. This method significantly improves the efficiency of communication-intensive kernels such as MPI_Alltoall. The tool has demonstrated 3.0–5.5x performance improvements in real projects and contributed to the successful completion of full-system executions.
Future work includes further enhancement of tuning methodologies and broader support for full and half-node-scale simulations.
Contributors:
This poster outlines RIST's user--support activities for full-system executions on the supercomputer Fugaku, a 158,976-node Arm-based system. Full-system executions allocate nearly the entire machine to a single project and are conducted only twice a year. Over the last two years, RIST has supported eight awarded projects, providing guidance from application preparation to performance tuning.
The call for full-node-scale simulations includes strict requirements, such as prior completion of a half-size full-node run and eligibility for future Gordon Bell Prize submission. Successful execution requires extremely high parallelism, hybrid MPI/OpenMP programming, balanced loads, optimized MPI communication, distributed I/O, uniform memory usage, and robust runtime monitoring. Among these, MPI communication optimization emerged as one of the most critical challenges at extreme scale.
To mitigate communication bottlenecks on Fugaku’s Tofu-D interconnect, RIST developed the FugakuNodeMappingTools, which automatically generate optimized rank-mapping files. The tool arranges processes to confine intensive communication within local 2x2x1 Tofu-unit blocks and minimizes hop counts along the Z dimension. This method significantly improves the efficiency of communication-intensive kernels such as MPI_Alltoall. The tool has demonstrated 3.0–5.5x performance improvements in real projects and contributed to the successful completion of full-system executions.
Future work includes further enhancement of tuning methodologies and broader support for full and half-node-scale simulations.
Contributors:
Format
on-demandon-site
