Large-Scale Computations with QCG-PilotJob

Large-Scale Computations with QCG-PilotJob

HPC Workflows

Information

Contributors:
Abstract:

Growing needs of computational scenarios on the one hand and growing popularity of large-scale HPC computing on the other require adequate systems that provide both good efficiency, great flexibility and simplicity of usage. Verification, Validation and Uncertainty Quantification of complex multiscale applications, being the main topic of VECMA, requires extremely large computing power. It is anticipated that calculations required for analysis of the use-cases being developed within the project may consume power of not only currently available peta-scale resources, but also power of emerging exa-scale ones. This places high demands on the software that should support such computations.

QCG-PilotJob is a lightweight Python implementation that is designed to enable easy and highly efficient execution of user tasks in the so-called pilot job flavour on HPC machines. The QCG-PilotJob instance started within a regular queuing system allocation may be seen as a separate, second-level and private queuing system. That is, once QCG-PilotJob is started, a user has full control on the tasks submitted to it. There are two basic ways of interaction with QCG-PilotJob. The static way allows to prepare a configuration of tasks in advance and submit such a configuration on the startup of QCG-PilotJob. The dynamic one allows interaction with the already running instance of QCG-PilotJob. What is important for QCG-PilotJob is the fact that it doesn't need any external services, thus it can be run by a user practically in any circumstances, whether it is a local machine for tests or SLURM system for the production runs.

Visit the Project Website