xbat – An Easy-to-Use and Universally Applicable Benchmarking Automation Tool for HPC Software Within the Project hpc.bw (dtec.bw)

xbat – An Easy-to-Use and Universally Applicable Benchmarking Automation Tool for HPC Software Within the Project hpc.bw (dtec.bw)

Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Project Poster
Optimizing for Energy and PerformancePerformance MeasurementSystem and Performance Monitoring

Information

Poster is on display.
Benchmarking applications in high-performance computing (HPC) systems is essential for optimising runtime, reducing energy consumption, and ensuring efficient hardware utilisation. However, accessing and interpreting performance metrics can be challenging and error prone, particularly for users without detailed hardware expertise. To address this, we present xbat (extended benchmarking automation tool),
developed by MEGWARE Computer Vertrieb und Service GmbH, as an easy-to-use, universally applicable, powerful tool to automate benchmarking and simplify performance analysis for HPC users of all skill levels.

This poster provides an overview of xbat’s architecture, features, and case studies within the project hpc.bw (dtec.bw), showcasing its ability to simplify benchmarking workflows and assist users in achieving optimal performance across diverse HPC applications.

xbat streamlines the entire benchmarking workflow, from job configuration and submission to data collection and analysis. Benchmarks can be submitted via a user-friendly web interface or the Slurm command-line interface (CLI). The tool integrates seamlessly into existing HPC infrastructures through a containerised deployment model and lightweight daemons, requiring minimal setup. Extensive online documentation, guides and a public demo instance are available. In addition to its benchmarking capabilities, xbat supports collaboration by allowing users to share configurations and results. Its intuitive design makes it accessible to beginners, while its comprehensive features cater to advanced users.

Up to 140 low-level performance metrics are collected, including CPU, memory, I/O, GPU, FPGA, and energy usage. These metrics are gathered (mainly by LIKWID) at the lowest possible resolution, such as threads and cores, and dynamically aggregated across higher levels, including NUMA domains, sockets, nodes, and jobs. This multi-level aggregation enables users to analyse performance at scales ranging from individual components to entire systems, providing insights tailored to specific use cases.

The xbat web interface acts as a central hub for benchmarking activities, presenting performance data through interactive, customisable graphs and detailed statistics. These can be exported as images, CSV or JSON, enabling users to easily include results in reports and presentations. By combining visual and statistical insights, xbat empowers users to understand application behaviour, identify inefficiencies, and evaluate the impact of different configurations and parameter settings.

xbat’s capabilities are demonstrated using benchmark applications of varying complexity and show that it can manage all aspects of the benchmarking workflow in a seamless manner. In particular, we focus on the open-source molecular dynamics research software ls1 mardyn, which can leverage large clusters with many nodes, and the closed-source mathematical optimisation package Gurobi, which is usually
used on machines with a moderate number of cores. Both packages present unique challenges. Mixed-integer programming solvers, such as those integrated in the Gurobi software, exhibit significant performance variability, so that seemingly innocuous parameter changes and machine characteristics can affect the runtime drastically, and ls1 mardyn comes with an auto-tuning library AutoPas, which enables the selection of various node-level algorithms to compute molecular trajectories.

xbat was released fully open-source in April 2025. Future enhancements include extended metric collection, increased focus on analysis and statistics, and automated bottleneck and anomaly detection.
Contributors:
Format
On DemandOn Site

Log in

See all the content and easy-to-use features by logging in or registering!