Accurate Matrix Multiplication on Binary128 using Ozaki Scheme



Although IEEE 754-2008 binary128 (quadruple-precision, with 15-bit exponent and 113-bit mantissa) is not implemented in hardware on many general-purpose processors including x86, the Intel and GNU compilers provide its software emulation for x86. However, the performance is significantly slow compared to binary64, which is implemented in hardware. This study presents a fast computation method for a matrix multiplication on binary128 data using the Ozaki scheme, which is an accurate matrix multiplication method based on the error free transformations for matrix multiplication, proposed by Ozaki et al. in 2012. This method can use binary64 matrix multiplication (DGEMM) for the main part of the computation, and thus good performance can be achieved at low implementation cost (especially for SIMD and thread parallelization costs). However, the performance of our method decreases as the absolute value range of the input matrices increases. In this presentation, we demonstrate that, on Intel’s Skylake CPUs, our method can achieve better performance than existing GEMM routines based on binary128 emulation or double-double arithmetic in an multi-precision BLAS/LAPACK, MPLAPACK, in most cases.

Log in