A Scheduling Model for the Square Kilometre Array Science Data Processor
Exascale SystemsHPC Workflows
Information
This PhD investigates the current design of the
Square Kilometre Array (SKA) Science Data Processor (SDP). The data scale of
the SKA is such that it is not possible to keep raw observation data on-premise
for more than a few weeks. Data reduction and analysis
pipelines, normally run by astronomers locally (either on a PC or a local HPC
centre), must instead be completed within this deadline, to ensure the
SKA's intermediary data storage - the SDP 'buffer' - is not filled. It is essential
that the scheduling and processing of observation data happens in an efficient
manner to guarantee the delivery of High Priority Science
Observations (HPSOs), by ensuring their post-processing pipelines are completed.
HPSO post-processing pipelines are represented as
Directed Acyclic Graphs (DAGs), which are commonly referred to as a science
workflow. There is a large body of literature associated with workflow
scheduling, both in grid and cloud computing environments. However, the
proposed system for the SKA uses a batch-processing model, rather than
apply one of the various workflow scheduling heuristics in the
literature. An analytical argument has been made that this it is unnecessary
to do so given the presence of an intermediary buffer.
This thesis intends to determine whether or not the SKA-SDP buffer-and batch-processing model is sufficient for the data and compute demands of an observation schedule, especially when faced with system delays on the SDP compute infrastructure. Additionally, we are interested in determining how much more effective - if at all - existing heuristic scheduling techniques currently implemented in science workflow managers (for example, Pegasus) are at improving the quality of batch-processed schedules. Finally, given SKA observations are so dependent on the processing of these workflows, there is an opportunity to apply workflow scheduling to a global task DAG, constructed of all workflows in an observation schedule. This PhD plans to investigate a decision support model that incorporates this global DAG.
In order to test these ideas, we need to model the SKA instrument, storage buffer, and computing facilities. In order to do this, TopSim - Telescope Operations Simulator - has been developed. This is a generalisable instrument-storage-compute discrete-event simulator that, in addition to simulating the data life-cycle and scheduling of the SKA, is able to simulate other global scheduling applications such as Internet of Things (e.g. Edge and Fog computing) or Remote sensing (e.g. geoscience or satellites).
I will present an overview of the observation and scheduling model we have developed for TopSim, an example decision algorithm for the Telescope using the Global task-graph, and preliminary results of some scheduling heuristic comparison simulations.
This thesis intends to determine whether or not the SKA-SDP buffer-and batch-processing model is sufficient for the data and compute demands of an observation schedule, especially when faced with system delays on the SDP compute infrastructure. Additionally, we are interested in determining how much more effective - if at all - existing heuristic scheduling techniques currently implemented in science workflow managers (for example, Pegasus) are at improving the quality of batch-processed schedules. Finally, given SKA observations are so dependent on the processing of these workflows, there is an opportunity to apply workflow scheduling to a global task DAG, constructed of all workflows in an observation schedule. This PhD plans to investigate a decision support model that incorporates this global DAG.
In order to test these ideas, we need to model the SKA instrument, storage buffer, and computing facilities. In order to do this, TopSim - Telescope Operations Simulator - has been developed. This is a generalisable instrument-storage-compute discrete-event simulator that, in addition to simulating the data life-cycle and scheduling of the SKA, is able to simulate other global scheduling applications such as Internet of Things (e.g. Edge and Fog computing) or Remote sensing (e.g. geoscience or satellites).
I will present an overview of the observation and scheduling model we have developed for TopSim, an example decision algorithm for the Telescope using the Global task-graph, and preliminary results of some scheduling heuristic comparison simulations.