Accurate and Reproducible Conjugate Gradient in Hybrid Parallel Environments

Parallel Math LibraryExtreme-Scale ParallelismPerformance & Correctness Tools



On Krylov subspace methods such as the Conjugate Gradient (CG), the number of iterations until convergence may increase due to the loss of computation accuracy caused by rounding errors in floating-point computations. Besides, as the order of operations is non-deterministic on parallel computations, the result and the behavior of the convergence may diverge in different environments, even for the same input. Hence, we propose three algorithmic strategies to secure accurate and reproducible computations in CG: - The first solution originates from the ExBLAS approach, which efficiently combines floating-point expansions and Kulisch accumulator, and preserves every bit of result until the final rounding. - The second is based on the Ozaki scheme, which is the error-free transformation for dot-product that can ensure the correct-rounding. One of the benefits of the method is that it can be built upon vendor-provided linear algebra libraries such as Intel Math Kernel Library and NVIDIA cuBLAS/ cuSparse, reducing the development cost. - The third is a customized version of the first that relies upon floating-point expansions and, hence, expands the intermediate precision. Instead of converting the entire solver, we identify those parts that violate non-associativity and fix them. These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we demonstrate the applicability and the effectiveness of the proposed strategies on CPUs, GPUs, and cross-platform using the 3D Poisson matrix as well as the matrices from the SuiteSparse Matrix Collection.