Study on the Facility Enhancement by Operational Data Analysis: A Comparison of the Operations in the K Computer and Fugaku

Variable PrecisionHPC WorkflowsParallel Math LibraryHPC & ML in IndustryQuantum Technologies & SimulationParallel File SystemsML Systems & ToolsExtreme-Scale ParallelismPerformance & Correctness ToolsHPC for Big Data AnalyticsPerformance Modeling & TuningVisualization & Virtual RealityHPC System Architecture

Information

Contributors:
Abstract:

For several years since the period of the K computer operation, we have been collecting various facility metrics to monitor and maintain the equipment and have extracted insights from the data as an application of the data collection. Our supercomputing center provides exascale computing resources with the supercomputer Fugaku which is developed as the successor of the K computer. Fugaku is designed to consume up to 37 MW of power, which is approximately three times larger than the K computer. To run the new system on the facilities which is installed 10 years ago, we needed a retrofit approach to enhance the power supply and cooling system. Prior to the official service in Fugaku, we provided a trial-use period for users and collected various metrics for Operational Data Analysis (ODA), as with the K computer operation. During the period in 2020, we analyzed our operation with the metrics to evaluate the enhancements. Base on the results, we found that the cooling system follows changes of mega-watts scale abrupt increases in power consumption while the Fugaku system requires a larger cooling capacity than the K computer. Especially, the additional chillers work well. Finally, in this poster, we show preliminary analysis results based on comparing the K computer and Fugaku cases. Also, we report what we have learned in the prelaunch service.