RESEARCH POSTER WINNER - 2nd PLACE: A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers

RESEARCH POSTER WINNER - 2nd PLACE: A Fast Infinite Precision Inner Product using Ozaki Scheme and Dot2, and Its Application to Reproducible Conjugate Gradient Solvers

Wednesday, June 1, 2022 2:04 PM to 2:08 PM · 4 min. (Europe/Berlin)
Hall D - 2nd Floor
Mixed Precision Algorithms

Information

We propose a new optimization method for an infinite precision inner product based on the Ozaki scheme. Infinite precision operations are beneficial not only for accurate computation but also for reproducible computation. In other words, infinite precision operations can return the same bit-level identical result on the same input, regardless of the computational environment. Our idea is to use high precision arithmetic for the calculation in the Ozaki scheme on memory-intensive BLAS operations. Based on the roofline model, we focus on the fact that Dot2, an inner-product algorithm that can be computed with twice the precision of working-precision, can be performed with memory-bound performance. As a result, we achieved a speedup of up to about two times faster than the conventional implementation. The idea can be applied to infinite precision sparse matrix-vector multiplication. As an application, we demonstrate its use in a reproducible Conjugate Gradient solver on GPUs. As a result, we achieved the lowest overhead of reproducible CG solvers ever developed; the overhead depends on the problem, but we observed about 3.2 to 11.0 times overhead with respect to the standard FP64 implementation in our evaluation.
Contributors:

  • Daichi Mukunoki (RIKEN Center for Computational Science)
  • Toshiyuki Imamura (RIKEN Center for Computational Science)
  • Takeshi Ogita (Tokyo Woman's Christian University)
  • Katsuhisa Ozaki (Shibaura Institute of Technology)
Format
On-site