Bitwise Reproducible Execution of Unstructured Mesh Applications

Bitwise Reproducible Execution of Unstructured Mesh Applications

Exascale SystemsHPC Workflows

Information

Engineering applications use floating point arithmetic which are not associative according to the IEEE specifications. In a parallel environment, this usually means the application becomes unreproducible due to the non-deterministic ordering of operations. Previously with the introduction of some temporary arrays and multiple sweeps on an unstructured mesh we were able to achieve bitwise reproducibility, even if the application is started with different number of MPI processes. In this work we present a new approach: we generate a reproducible coloring defined on the mesh. Using this coloring, we reorder the execution of the application to generate same results. We implement our work in the OP2 domain-specific library, which provides an API that abstracts the solution of unstructured mesh computations, and demonstrate how the whole process can be automated without intervention from the user. We carry out the performance analysis of our method applied to three applications: a simple finite volume application, a more complex finite element code that uses a conjugate-gradient solver and a multi-layer CFD code. On these applications we show a 2.9x to 1.07x slowdown on CPUs and 19.9x to 2.1x slowdown on GPUs as a price for full bitwise reproducibility.

Log in