SciCumulus: A Lightweight Cloud Middleware to Explore Many Task Computing Paradigm in Scientific Workflows
- 1 July 2010
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 21596182,p. 378-385
- https://doi.org/10.1109/cloud.2010.64
Abstract
Most of the large-scale scientific experiments modeled as scientific workflows produce a large amount of data and require workflow parallelism to reduce workflow execution time. Some of the existing Scientific Workflow Management Systems (SWfMS) explore parallelism techniques - such as parameter sweep and data fragmentation. In those systems, several computing resources are used to accomplish many computational tasks in homogeneous environments, such as multiprocessor machines or cluster systems. Cloud computing has become a popular high performance computing model in which (virtualized) resources are provided as services over the Web. Some scientists are starting to adopt the cloud model in scientific domains and are moving their scientific workflows (programs and data) from local environments to the cloud. Nevertheless, it is still difficult for the scientist to express a parallel computing paradigm for the workflow on the cloud. Capturing distributed provenance data at the cloud is also an issue. Existing approaches for executing scientific workflows using parallel processing are mainly focused on homogeneous environments whereas, in the cloud, the scientist has to manage new aspects such as initialization of virtualized instances, scheduling over different cloud environments, impact of data transferring and management of instance images. In this paper we propose SciCumulus, a cloud middleware that explores parameter sweep and data fragmentation parallelism in scientific workflow activities (with provenance support). It works between the SWfMS and the cloud. SciCumulus is designed considering cloud specificities. We have evaluated our approach by executing simulated experiments to analyze the overhead imposed by clouds on the workflow execution time.Keywords
This publication has 19 references indexed in Scilit:
- Towards a Taxonomy for Cloud Computing from an e-Science PerspectivePublished by Springer Nature ,2010
- Towards supporting the life cycle of large scale scientific experimentsInternational Journal of Business Process Integration and Management, 2010
- Provenance for Computational Tasks: A SurveyComputing in Science & Engineering, 2008
- The Open Provenance Model: An OverviewLecture Notes in Computer Science, 2008
- MapReduceCommunications of the ACM, 2008
- Pegasus: Mapping Large-Scale Workflows to Distributed ResourcesPublished by Springer Nature ,2007
- Introducing secure provenancePublished by Association for Computing Machinery (ACM) ,2007
- An Opportunistic Algorithm for Scheduling Workflows on GridsPublished by Springer Nature ,2007
- VisTrailsPublished by Association for Computing Machinery (ACM) ,2006
- Basics of Software Engineering ExperimentationPublished by Springer Nature ,2001