Stencil Computation with OMP, SYCL, and Tensor Compiler on Intel GPUs

Wednesday, June 24, 2026 3:45 PM to 5:15 PM · 1 hr. 30 min. (Europe/Berlin)

Foyer D-G - 2nd Floor

Project Poster

Compiler and Tools for Parallel ProgrammingGeosciencesIndustrial Use Cases of HPC, ML and QCParallel Programming Languages

Information

Poster is on display.

High-order stencil computations are a fundamental component of many scientific simulations and are widely used in finite-difference (FD) schemes for solving partial differential equations. Stencil computation remains challenging to optimize because they are inherently memory-bandwidth bound in most available processor architectures and require careful management of data reuse and cache efficiency. These challenges are further amplified by the increasing complexity of modern CPU and GPU architectures.

This work analyzes a high-order standard stencil computation arising from a finite-difference discretization of the 3D acoustic isotropic wave equation, the original implementation is a validated proxy application from Exploration Geophysics. The spatial discretization employs a 25-point, 8th-order stencil, involving four neighboring points in each spatial direction in addition to the central point. At each grid point and time step, the wavefield update requires the evaluation of a discrete approximation of the Laplacian operator, resulting in many global memory accesses with limited temporal reuse.

We investigate the application of a tensor compiler paradigm to this well-established computing pattern and compare it against two conventional programming approaches. Thus, three implementations are evaluated on Intel GPUs: (1) a baseline OpenMP version relying primarily on compiler-driven optimizations, (2) a SYCL implementation providing explicit control over parallelism and memory hierarchy, and (3) a domain-specific language (DSL)–based implementation using the Tiny Tensor Compiler (TinyTC).

TinyTC is a compiler designed to parse, compile, and execute programs written in a tensor-oriented DSL optimized for Intel GPU hardware. It supports a tile-based programming paradigm built on cooperative matrices (subgroup-distributed matrices) and directly generates SPIR-V with minimal dependencies. This design enables fine-grained control over data movement, cache utilization, and instruction selection, while abstracting implementation details for accelerating matrix multiplication on vector and tensor cores. In particular, TinyTC allows the direct mapping of stencil computations onto hardware tensor instructions, including 2D block load operations that maximize effective memory bandwidth while benefiting from hardware-supported out of bounds access handling.

Experimental results demonstrate that the TinyTC-based implementation significantly outperforms both the SYCL and OpenMP versions. On current Intel GPUs, the tensor compiler approach achieves up to 85% of the theoretical peak memory bandwidth, highlighting its effectiveness for bandwidth-bound stencil workloads. Compared to the optimized SYCL implementation, TinyTC delivers performance improvements of approximately 20%, and up to 40% over the OpenMP baseline.

Beyond performance gains on existing hardware, the TinyTC implementation is designed to be forward-compatible with future Intel GPU architectures, including Crescent Island. By relying on SPIR-V generation and explicit tensor instruction mapping, the proposed approach is well positioned to exploit upcoming hardware capabilities without major code restructuring.

These results demonstrate that tensor compiler paradigms, originally developed for AI workloads, can be successfully applied to classical HPC stencil computations. TinyTC enables near-peak memory bandwidth utilization and provides a portable, future-ready programming model for high-performance stencil applications on current and next-generation Intel GPUs.

Contributors:

Mauricio Araya-Polo

Format

on-demandon-site

Speakers

Timothée Ewart

Software EngineerIntel

Session

Project Poster Reception

Wednesday, June 24, 2026 3:45 PM to 5:15 PM