Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing

Xin Du; Songtao Tang; Zhihui Lu; Keke Gai; Jie Wu; Patrick C.K. Hung

doi:10.1145/3531327

Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing

Xin Du, Songtao Tang, Zhihui Lu^*, Keke Gai, Jie Wu, Patrick C.K. Hung

^*Corresponding author for this work

School of Cyberspace Science and Technology

Research output: Contribution to journal › Article › peer-review

41 Citations (Scopus)

Abstract

In Industry 4.0 and Internet of Things (IoT) environments, the heterogeneous edge-cloud computing paradigm can provide a more proper solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within the individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the data placement strategy in the runtime stage of scientific workflows. Through using the heterogeneous edge-cloud computing architecture to simulate IoT environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.

Original language	English
Article number	42
Journal	ACM Transactions on Management Information Systems
Volume	13
Issue number	4
DOIs	https://doi.org/10.1145/3531327
Publication status	Published - 10 Aug 2022

Keywords

Heterogeneous edge-cloud computing
IoT environments
data-sharing
scientific workflows

Access to Document

10.1145/3531327

Cite this

Du, X., Tang, S., Lu, Z., Gai, K., Wu, J., & Hung, P. C. K. (2022). Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing. ACM Transactions on Management Information Systems, 13(4), Article 42. https://doi.org/10.1145/3531327

@article{af464c00d0cf44daa4ce9d68e777ca19,

title = "Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing",

abstract = "In Industry 4.0 and Internet of Things (IoT) environments, the heterogeneous edge-cloud computing paradigm can provide a more proper solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within the individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the data placement strategy in the runtime stage of scientific workflows. Through using the heterogeneous edge-cloud computing architecture to simulate IoT environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.",

keywords = "Heterogeneous edge-cloud computing, IoT environments, data-sharing, scientific workflows",

author = "Xin Du and Songtao Tang and Zhihui Lu and Keke Gai and Jie Wu and Hung, {Patrick C.K.}",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computing Machinery.",

year = "2022",

month = aug,

day = "10",

doi = "10.1145/3531327",

language = "English",

volume = "13",

journal = "ACM Transactions on Management Information Systems",

issn = "2158-656X",

publisher = "Association for Computing Machinery (ACM)",

number = "4",

}

TY - JOUR

T1 - Scientific Workflows in IoT Environments

T2 - A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing

AU - Du, Xin

AU - Tang, Songtao

AU - Lu, Zhihui

AU - Gai, Keke

AU - Wu, Jie

AU - Hung, Patrick C.K.

PY - 2022/8/10

Y1 - 2022/8/10

N2 - In Industry 4.0 and Internet of Things (IoT) environments, the heterogeneous edge-cloud computing paradigm can provide a more proper solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within the individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the data placement strategy in the runtime stage of scientific workflows. Through using the heterogeneous edge-cloud computing architecture to simulate IoT environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.

AB - In Industry 4.0 and Internet of Things (IoT) environments, the heterogeneous edge-cloud computing paradigm can provide a more proper solution to deploy scientific workflows compared to cloud computing or other traditional distributed computing. Owing to the different sizes of scientific datasets and the privacy issue concerning some of these datasets, it is essential to find a data placement strategy that can minimize data transmission time. Some state-of-the-art data placement strategies combine edge computing and cloud computing to distribute scientific datasets. However, the dynamic distribution of newly generated datasets to appropriate datacenters and exiting the spent datasets are still a challenge during workflows execution. To address this challenge, this study not only constructs a data placement model that includes shared datasets within the individual and among multiple workflows across various geographical regions, but also proposes a data placement strategy (DYM-RL-DPS) based on algorithms of two stages. First, during the build-time stage of workflows, we use the discrete particle swarm optimization algorithm with differential evolution to pre-allocate initial datasets to proper datacenters. Then, we reformulate the dynamic datasets distribution problem as a Markov decision process and provide a reinforcement learning-based approach to learn the data placement strategy in the runtime stage of scientific workflows. Through using the heterogeneous edge-cloud computing architecture to simulate IoT environments, we designed comprehensive experiments to demonstrate the superiority of DYM-RL-DPS. The results of our strategy can effectively reduce the data transmission time as compared to other strategies.

KW - Heterogeneous edge-cloud computing

KW - IoT environments

KW - data-sharing

KW - scientific workflows

UR - http://www.scopus.com/inward/record.url?scp=85136559261&partnerID=8YFLogxK

U2 - 10.1145/3531327

DO - 10.1145/3531327

M3 - Article

AN - SCOPUS:85136559261

SN - 2158-656X

VL - 13

JO - ACM Transactions on Management Information Systems

JF - ACM Transactions on Management Information Systems

IS - 4

M1 - 42

ER -

Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this