Large-Scale Computations with QCG-PilotJob
HPC Workflows
Information
Contributors:
Abstract:
Growing needs of computational scenarios on the one hand and growing popularity of large-scale HPC computing on the other require adequate systems that provide both good efficiency, great flexibility and simplicity of usage. Verification, Validation and Uncertainty Quantification of complex multiscale applications, being the main topic of VECMA, requires extremely large computing power. It is anticipated that calculations required for analysis of the use-cases being developed within the project may consume power of not only currently available peta-scale resources, but also power of emerging exa-scale ones. This places high demands on the software that should support such computations.
QCG-PilotJob is a lightweight Python implementation that is designed to enable easy and highly efficient execution of user tasks in the so-called pilot job flavour on HPC machines. The QCG-PilotJob instance started within a regular queuing system allocation may be seen as a separate, second-level and private queuing system. That is, once QCG-PilotJob is started, a user has full control on the tasks submitted to it. There are two basic ways of interaction with QCG-PilotJob. The static way allows to prepare a configuration of tasks in advance and submit such a configuration on the startup of QCG-PilotJob. The dynamic one allows interaction with the already running instance of QCG-PilotJob. What is important for QCG-PilotJob is the fact that it doesn't need any external services, thus it can be run by a user practically in any circumstances, whether it is a local machine for tests or SLURM system for the production runs.Visit the Project Website
- Hamid Arabnejad (Brunel University London)
- Derek Groen (Brunel University London)
- Paul Karlshoefer (Center for Excellence in Performance Programming ATOS)
- Piotr Kopta (Poznań Supercomputing and Networking Center)
- Michał Kulczewski (Poznań Supercomputing and Networking Center)
- Jalal Lakhlili (Max-Planck Institute for Plasma Physics)
- Tomasz Piontek (Poznań Supercomputing and Networking Center)
- Erwan Raffin (Atos)
- Bartosz Bosak (Poznań Supercomputing and Networking Center)
Abstract:
Growing needs of computational scenarios on the one hand and growing popularity of large-scale HPC computing on the other require adequate systems that provide both good efficiency, great flexibility and simplicity of usage. Verification, Validation and Uncertainty Quantification of complex multiscale applications, being the main topic of VECMA, requires extremely large computing power. It is anticipated that calculations required for analysis of the use-cases being developed within the project may consume power of not only currently available peta-scale resources, but also power of emerging exa-scale ones. This places high demands on the software that should support such computations.
QCG-PilotJob is a lightweight Python implementation that is designed to enable easy and highly efficient execution of user tasks in the so-called pilot job flavour on HPC machines. The QCG-PilotJob instance started within a regular queuing system allocation may be seen as a separate, second-level and private queuing system. That is, once QCG-PilotJob is started, a user has full control on the tasks submitted to it. There are two basic ways of interaction with QCG-PilotJob. The static way allows to prepare a configuration of tasks in advance and submit such a configuration on the startup of QCG-PilotJob. The dynamic one allows interaction with the already running instance of QCG-PilotJob. What is important for QCG-PilotJob is the fact that it doesn't need any external services, thus it can be run by a user practically in any circumstances, whether it is a local machine for tests or SLURM system for the production runs.Visit the Project Website