Artificial Intelligence and Machine Learning for HPC Workload Analysis

Artificial Intelligence and Machine Learning for HPC Workload Analysis

Wednesday, May 15, 2024 1:00 PM to 2:00 PM · 1 hr. (Europe/Berlin)
Hall G1 - 2nd floor
Birds of a Feather
Development of HPC SkillsHigh-Performance Data AnalyticsML Systems and ToolsPerformance Measurement

Information

HPC systems already produce terabytes of monitoring, usage, and performance data each day, ranging from that produced by low-level hardware telemetry and error reporting systems, to hardware performance counters, to job scheduling and system logs, with natural language text from administrator troubleshooting tickets and notes. Systems of the future will be even larger and more complex. This will increase the challenges of monitoring and characterizing user behaviors on these systems. Meanwhile, machine learning and artificial intelligence techniques have already started to demonstrate effectiveness for characterizing and extracting knowledge from large and complex datasets, but these efforts are just beginning to realize the full value of their potential across a wide variety of domains. For these reasons, we propose the First BoF on Artificial Intelligence and Machine Learning for HPC Workload Analysis. This BoF will provide a much-needed opportunity not only for discussing cutting-edge research ideas but also for bringing together researchers working across the disciplines of data science, machine learning, statistics, applied mathematics, systems design, systems monitoring, systems resilience, and hardware architecture. The proposed BoF aims to help the community advance in the direction towards better and more efficient monitoring and understanding of the usage of large-scale computing systems.
Contributors:
Format
On-site
Targeted Audience
With the rising interest in the use of machine learning techniques to better understand and design large-scale computing facilities, the audience we target through this BoF is intentionally broad and inclusive, ranging from machine learning and systems experts through students, and spanning across industry, academia, and government.