

PDI and DEISA : Tools to Decouple I/O Concerns Towards In-Situ Analysis
Tuesday, June 10, 2025 3:00 PM to Thursday, June 12, 2025 4:00 PM · 2 days 1 hr. (Europe/Berlin)
Foyer D-G - 2nd floor
Project Poster
Compiler and Tools for Parallel ProgrammingHigh-Performance Data AnalyticsRuntime Systems for HPC
Information
Poster is on display.
High-performance simulations increasingly face a bottleneck due to the growing gap between CPU performance and I/O bandwidth. Traditional post-hoc processing struggles to manage the massive datasets generated by exascale simulations. In-situ analysis offers a solution by processing data as soon as it is produced, bypassing the limitations of disk I/O and fully leveraging high-performance computing (HPC) platforms. However, in-situ approaches often require complex setups and the development of parallel analysis codes. This poster presents PDI and DEISA, two tools designed to address these challenges by decoupling I/O concerns from simulation codes and enabling efficient in-situ data analysis.
PDI (Parallel Data Interface) is a library that decouples I/O concerns from high-performance simulation codes. It uses a declarative API, which allows simulation codes to expose data buffers and notify significant steps in the simulation. PDI enables the description of I/O operations in a dedicated YAML file, rather than interleaving them within the simulation code, improving portability and maintainability. PDI’s flexibility stems from its plugin system. This system supports a variety of libraries and functionalities. For example, PDI has plugins for:
· Data storage: HDF5 and NetCDF, which enable the writing of simulation data to a file system
· Data analysis: Python, which allows for the integration of Python-based analysis codes
· In situ processing: DEISA, which enables direct coupling with task-based a analysis framework (Dask)
PDI's design ensures minimal overhead, with negligible impact on simulation performance. By using PDI, developers can easily integrate a variety of I/O functionalities without having to modify the core simulation code.
DEISA (Dask-Enabled In-Situ Analytics) is a library that builds upon PDI to enable in-situ analysis by coupling MPI parallel codes with analysis written using Dask. DEISA employs a hybrid model combining the Bulk Synchronous Parallel (BSP) paradigm, common in simulations, with a distributed task-based approach for analysis. This combination reduces complexity and leverages the strengths of both paradigms while requiring only minimal changes to both the simulation and analysis codes when compared to their post-hoc counterparts. DEISA offers access to a robust ecosystem of established tools, including NumPy, Pandas, scikit-learn, and matplotlib, for in-situ use.
DEISA simplifies the transition from post-hoc to in-situ analysis. The system uses metadata from the PDI-enabled simulation code and transfers it to the analysis client, enabling data transfer optimization between simulation and analysis. DEISA provides users with enhanced control over the analysis they wish to perform, such as allowing them to focus on specific subsets of data or particular time-steps.
The poster will demonstrate the practical application of PDI and DEISA through concrete examples, including a 2D heat equation simulation and the ARK-MHD simulation. In the case of the heat equation, DEISA facilitates the generation of an AI training dataset and performs inference to detect heat sources. For the ARK-MHD simulation, DEISA generates 2D slices of the general 3D space and performs an FFT on those slices. The poster will highlight how these tools enable real-time data analysis and bypass the need for extensive I/O operations.
Contributors:
High-performance simulations increasingly face a bottleneck due to the growing gap between CPU performance and I/O bandwidth. Traditional post-hoc processing struggles to manage the massive datasets generated by exascale simulations. In-situ analysis offers a solution by processing data as soon as it is produced, bypassing the limitations of disk I/O and fully leveraging high-performance computing (HPC) platforms. However, in-situ approaches often require complex setups and the development of parallel analysis codes. This poster presents PDI and DEISA, two tools designed to address these challenges by decoupling I/O concerns from simulation codes and enabling efficient in-situ data analysis.
PDI (Parallel Data Interface) is a library that decouples I/O concerns from high-performance simulation codes. It uses a declarative API, which allows simulation codes to expose data buffers and notify significant steps in the simulation. PDI enables the description of I/O operations in a dedicated YAML file, rather than interleaving them within the simulation code, improving portability and maintainability. PDI’s flexibility stems from its plugin system. This system supports a variety of libraries and functionalities. For example, PDI has plugins for:
· Data storage: HDF5 and NetCDF, which enable the writing of simulation data to a file system
· Data analysis: Python, which allows for the integration of Python-based analysis codes
· In situ processing: DEISA, which enables direct coupling with task-based a analysis framework (Dask)
PDI's design ensures minimal overhead, with negligible impact on simulation performance. By using PDI, developers can easily integrate a variety of I/O functionalities without having to modify the core simulation code.
DEISA (Dask-Enabled In-Situ Analytics) is a library that builds upon PDI to enable in-situ analysis by coupling MPI parallel codes with analysis written using Dask. DEISA employs a hybrid model combining the Bulk Synchronous Parallel (BSP) paradigm, common in simulations, with a distributed task-based approach for analysis. This combination reduces complexity and leverages the strengths of both paradigms while requiring only minimal changes to both the simulation and analysis codes when compared to their post-hoc counterparts. DEISA offers access to a robust ecosystem of established tools, including NumPy, Pandas, scikit-learn, and matplotlib, for in-situ use.
DEISA simplifies the transition from post-hoc to in-situ analysis. The system uses metadata from the PDI-enabled simulation code and transfers it to the analysis client, enabling data transfer optimization between simulation and analysis. DEISA provides users with enhanced control over the analysis they wish to perform, such as allowing them to focus on specific subsets of data or particular time-steps.
The poster will demonstrate the practical application of PDI and DEISA through concrete examples, including a 2D heat equation simulation and the ARK-MHD simulation. In the case of the heat equation, DEISA facilitates the generation of an AI training dataset and performs inference to detect heat sources. For the ARK-MHD simulation, DEISA generates 2D slices of the general 3D space and performs an FFT on those slices. The poster will highlight how these tools enable real-time data analysis and bypass the need for extensive I/O operations.
Contributors:
Format
On DemandOn Site

