

Online Deep Learning Training and Inference in HPC Programs with TorchFort Library
Monday, June 22, 2026 2:00 PM to 6:00 PM · 4 hr. (Europe/Berlin)
Hall X9 - 1st Floor
Tutorial
AI Applications powered by HPC TechnologiesEngineeringHPC Simulations enhanced by Machine LearningML Systems and FrameworksPhysics
Information
Researchers are using numerical simulation data to train deep learning (DL) models for a wide variety of tasks. These models include surrogate models for efficient parameter space exploration applications, regression models for approximating numerics, generative models for super-resolution applications and reinforcement learning (RL) models for control applications. However, as researchers undertake simulations at increasingly high resolutions, it can lead to an explosion of data which is difficult to harness for deep learning purposes. For example, a high-resolution direct numerical simulation (DNS) computational fluid dynamics (CFD) data can be hundreds of GB per single time snapshot. To circumvent this, we can adopt the online training approach where the DL training process is run concurrently to the simulation and the training data is read directly from the memory without the need for storing it to disk. Online training is also a natural framework for reinforcement learning applications as they require interaction between the agent and simulation environment.
Fortran and C/C++ HPC codes underpin the majority of scientific computing applications, whereas deep learning is dominated by Python. In this tutorial, we will show how to use the TorchFort library to perform online DL training and inference with Fortran and C++ -based numerical simulation programs. The tutorial is structured as follows. First, we start with a lecture that covers most common techniques, model architectures and applications in AI for Science. In the lecture, we also delve deeper into the online (in-situ) learning approach and detail the TorchFort library. The last two hours of the tutorial are dedicated to a series of hands-on exercises where participants are guided to implement the online training and inference approach within a real Fortran-based simulation code.
Prerequisite: We will use NVIDIA Brev platform to run the exercises. Prior to the tutorial, please email teaching assistant Benet Eiximeno (beiximeno@nvidia.com) to receive an invitation to register and join the tutorial group on Brev.
Fortran and C/C++ HPC codes underpin the majority of scientific computing applications, whereas deep learning is dominated by Python. In this tutorial, we will show how to use the TorchFort library to perform online DL training and inference with Fortran and C++ -based numerical simulation programs. The tutorial is structured as follows. First, we start with a lecture that covers most common techniques, model architectures and applications in AI for Science. In the lecture, we also delve deeper into the online (in-situ) learning approach and detail the TorchFort library. The last two hours of the tutorial are dedicated to a series of hands-on exercises where participants are guided to implement the online training and inference approach within a real Fortran-based simulation code.
Prerequisite: We will use NVIDIA Brev platform to run the exercises. Prior to the tutorial, please email teaching assistant Benet Eiximeno (beiximeno@nvidia.com) to receive an invitation to register and join the tutorial group on Brev.
Format
on-site
Targeted Audience
Numerical simulation researchers and scientific AI researchers, in particular those who are interested in
combining Fortran and C++ -based HPC codes with AI capabilities.
Beginner Level
50%
Intermediate Level
50%
Prerequesites
The participants should bring their laptop. We will arrange a compute platform for the duration of the tutorial together with a containerised environment, including pre-built TorchFort-enabled applications that participants can modify.
