Scheduling parameter sweep workflow in the grid
thesisposted on 17.02.2017 by Smanchat, Sucha
In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.
Workflow technology has been adopted in scientific domains to orchestrate and automate scientific processes in order to facilitate experimentation. Such scientific workflows often involve large data sets and intensive computation that necessitate the use of the Grid, which offers supercomputing power through shared distributed resources. To execute a scientific workflow in the Grid, tasks within the workflow that represent steps in the scientific process are assigned to Grid resources for execution. To ensure efficient execution of the workflow, Grid workflow scheduling is required to manage the allocation of Grid resources. Although many Grid workflow scheduling techniques exist, they are mainly designed for the execution of a single workflow. This is not the case with parameter sweep workflows which focus on parametric study and parameter optimisation. A parameter sweep workflow is executed numerous times with different input parameters in order to determine the effect of each parameter combination on the experiment. While executing multiple instances of a parameter sweep workflow in parallel can reduce the time required for the overall execution, this execution introduces new challenges to Grid workflow scheduling. Not only is a scheduling algorithm that is able to manage multiple workflow instances required, but this algorithm also needs the ability to schedule tasks across multiple workflow instances judiciously, as tasks may require the same set of Grid resources. Without appropriate resource allocation, resource competition problem could arise. In the thesis, we propose a new Grid workflow scheduling technique for parameter sweep workflow called the Besom scheduling algorithm. The scheduling decision of our algorithm is based on the resource dependencies of tasks in the workflow, as well as conventional Grid resource-performance metrics. In addition, the proposed technique is extended to handle loop structures in scientific workflows without using existing loop-unrolling techniques. We evaluate the Besom algorithm under a variety of conditions. A comparison between the simulation results of the Besom algorithm and of the three existing Grid workflow scheduling algorithms shows that the Besom algorithm is able to perform better than the existing algorithms for workflows that have complex structures and that involve overlapping resource dependencies of tasks. The Besom scheduling algorithm advances the ability to schedule parallel execution of parameter sweep workflows in the Grid. The outcomes of this thesis justify continuing research in this area to increase our understanding of scheduling multiple Grid workflow instances and to provide support to those involved in parametric study and scientific workflow management.