High Performance Non-Gaussian Spatial Modeling on Manycore Systems

Parallel Math LibraryExtreme-Scale ParallelismHPC for Big Data Analytics



The Gaussian random field (GRF) constitutes a powerful tool to model geospatial data in the pursuit of understanding the underlying relations between spatial locations and their impacts on spatial prediction performance. GRF modeling involves the maximum likelihood estimation (MLE) operation which requires solving a nonconvex optimization problem iteratively. Although the GRF modeling performs well in many cases, real data could be highly non-Gaussian and may show characteristics like data skewness and/or heavier tails. Thus, in this work, we adopt the non-Gaussian modeling scheme using Tukey g-and-h random fields (TGHRF). TGHRF has the ability to incorporate both skewness and heavy-tailed features of real geospatial data while preserving the ability to model traditional Gaussian data. We propose a high-performance implementation of the TGHRF modeling and prediction operations on large-scale systems using task-based parallelism, state-of-the-art parallel linear algebra libraries, and runtime systems. The proposed implementation has been integrated into ExaGeoStat, a parallel high-performance framework for computational geostatistics on many-core systems. We report the performance results of the non-Gaussian MLE operation on several multicore architectures such as Intel Haswell, Intel skylake, Intel Cascade Lake, and AMD EPYC Rome. Moreover, using Shaheen-II, i.e., a Cray XC40 supercomputer, we are able to achieve up to 78 TFlops/s using 128 nodes for the MLE operation and up to 37 TFlops/s for the prediction operation. The accuracy of the proposed implementation is also estimated using both synthetic and real datasets.