Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows

Mingzhong Wang; Liehuang Zhu; Zijian Zhang

doi:10.1016/j.future.2014.08.009

Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows

Mingzhong Wang^*, Liehuang Zhu, Zijian Zhang

^*Corresponding author for this work

School of Cyberspace Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

5 Citations (Scopus)

Abstract

Data-intensive workflows are generally computing- and data-intensive with large volume of data generated during their execution. Therefore, some of the data should be saved to avoid the expensive re-execution of tasks in case of exceptions. However, cloud-based data storage services come at some expense. In this paper, we introduce the risk evaluation model tailored for workflow structure to measure and achieve the trade-off between the overhead of backup storage and the cost of data regeneration in failure, making the service selection and execution more efficient and robust. The proposed method computes and compares the potential loss with and without data backup to achieve the trade-off between overhead of intermediate dataset backup and task re-execution after exceptions. We also design the utility function with the model and apply a genetic algorithm to find the optimized schedule. The results show that the robustness of the schedule is increased while the possible risk of failure is minimized, especially when the volume of generated data is not large in comparison with the input.

Original language	English
Pages (from-to)	524-533
Number of pages	10
Journal	Future Generation Computer Systems
Volume	55
DOIs	https://doi.org/10.1016/j.future.2014.08.009
Publication status	Published - 1 Feb 2016

Keywords

Checkpoint
Data-intensive workflow
Intermediate dataset
Risk evaluation
Robustness

Access to Document

10.1016/j.future.2014.08.009

Cite this

@article{36fb81debefe4bab810a471b14b7b486,

title = "Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows",

abstract = "Data-intensive workflows are generally computing- and data-intensive with large volume of data generated during their execution. Therefore, some of the data should be saved to avoid the expensive re-execution of tasks in case of exceptions. However, cloud-based data storage services come at some expense. In this paper, we introduce the risk evaluation model tailored for workflow structure to measure and achieve the trade-off between the overhead of backup storage and the cost of data regeneration in failure, making the service selection and execution more efficient and robust. The proposed method computes and compares the potential loss with and without data backup to achieve the trade-off between overhead of intermediate dataset backup and task re-execution after exceptions. We also design the utility function with the model and apply a genetic algorithm to find the optimized schedule. The results show that the robustness of the schedule is increased while the possible risk of failure is minimized, especially when the volume of generated data is not large in comparison with the input.",

keywords = "Checkpoint, Data-intensive workflow, Intermediate dataset, Risk evaluation, Robustness",

author = "Mingzhong Wang and Liehuang Zhu and Zijian Zhang",

year = "2016",

month = feb,

day = "1",

doi = "10.1016/j.future.2014.08.009",

language = "English",

volume = "55",

pages = "524--533",

journal = "Future Generation Computer Systems",

issn = "0167-739X",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows

AU - Wang, Mingzhong

AU - Zhu, Liehuang

AU - Zhang, Zijian

PY - 2016/2/1

Y1 - 2016/2/1

N2 - Data-intensive workflows are generally computing- and data-intensive with large volume of data generated during their execution. Therefore, some of the data should be saved to avoid the expensive re-execution of tasks in case of exceptions. However, cloud-based data storage services come at some expense. In this paper, we introduce the risk evaluation model tailored for workflow structure to measure and achieve the trade-off between the overhead of backup storage and the cost of data regeneration in failure, making the service selection and execution more efficient and robust. The proposed method computes and compares the potential loss with and without data backup to achieve the trade-off between overhead of intermediate dataset backup and task re-execution after exceptions. We also design the utility function with the model and apply a genetic algorithm to find the optimized schedule. The results show that the robustness of the schedule is increased while the possible risk of failure is minimized, especially when the volume of generated data is not large in comparison with the input.

AB - Data-intensive workflows are generally computing- and data-intensive with large volume of data generated during their execution. Therefore, some of the data should be saved to avoid the expensive re-execution of tasks in case of exceptions. However, cloud-based data storage services come at some expense. In this paper, we introduce the risk evaluation model tailored for workflow structure to measure and achieve the trade-off between the overhead of backup storage and the cost of data regeneration in failure, making the service selection and execution more efficient and robust. The proposed method computes and compares the potential loss with and without data backup to achieve the trade-off between overhead of intermediate dataset backup and task re-execution after exceptions. We also design the utility function with the model and apply a genetic algorithm to find the optimized schedule. The results show that the robustness of the schedule is increased while the possible risk of failure is minimized, especially when the volume of generated data is not large in comparison with the input.

KW - Checkpoint

KW - Data-intensive workflow

KW - Intermediate dataset

KW - Risk evaluation

KW - Robustness

UR - http://www.scopus.com/inward/record.url?scp=84954389639&partnerID=8YFLogxK

U2 - 10.1016/j.future.2014.08.009

DO - 10.1016/j.future.2014.08.009

M3 - Article

AN - SCOPUS:84954389639

SN - 0167-739X

VL - 55

SP - 524

EP - 533

JO - Future Generation Computer Systems

JF - Future Generation Computer Systems

ER -

Risk-aware intermediate dataset backup strategy in cloud-based data intensive workflows

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this