BSC-wise Dynamic Resources Management
Wednesday, May 15, 2024 10:45 AM to 11:05 AM · 20 min. (Europe/Berlin)
Hall 4 - Ground floor
Focus Session
Parallel Programming LanguagesResource Management and SchedulingRuntime Systems for HPC
Information
Malleability and dynamic resources have been demonstrated in several studies to increase the productivity of HPC facilities in terms of completed jobs per unit of time. In this regard, updating the resources assigned to an application during its execution accelerates global job processing. Furthermore, users of malleable applications can benefit from malleability when they are expected to execute large workloads since they will get faster results.
OmpSs-2@Cluster is the extension of OmpSs-2 that supports offloading OpenMP-style tasks among nodes, a viable alternative to MPI + OmpSs-2. Since data distribution and transfers are delegated to the runtime system, the programming model naturally supports dynamic resources.
PyCOMPS/COMPSs is a task-based programming model for distributed computing. Compared to OmpSs-2@Cluster, it aims to provide a solution for larger infrastructures and coarser grain tasks. The COMPSs runtime implements the concept of elasticity, like OmpSs-2@Cluster, being able to add or release compute nodes to the computing platform.
The dynamic management of resources (DMR) framework is a programming layer on top of malleability technologies that pose a simple MPI-like syntax to users. Particularly, DMR relies on DMRlib, a process malleability solution that supports job reconfiguration, data redistribution, process management, execution resuming, and dynamic resources.
DLB uses the malleability of the shared memory programming model to fix load imbalance at the distributed memory level (i.e., MPI+OpenMP). In particular, the LeWI module will intercept the blocking MPI calls, and when a process reaches a blocking call, it automatically changes the number of OpenMP threads.
Format
On-siteOn Demand
Beginner Level
30%
Intermediate Level
70%