Abstract
Task runtime estimation is a prerequisite for workflow scheduling in cloud data centers. However, the existing runtime prediction methods for workflow activities fail to effectively extract categorical and numerical features. In this paper, we propose a multi-dimensional feature fusion-based runtime prediction approach for workflow tasks. Firstly, we construct a stacked residual recurrent neural network with an attention mechanism for mapping categorical data from high-dimensional sparse space to low-dimensional dense space so as to enlarge its capability of parsing categorical data for categorical feature extraction. Secondly, extreme gradient boosting is introduced to discretize the numerical data and enhance the nonlinear representation capability for numerical features through sparsely processing the input vectors within dense space. Thirdly, we design a heterogeneous multi-dimensional feature fusion strategy, and then blend the extracted features with original inputs to mine comprehensive knowledge for runtime prediction. Finally, based on the resulting multi-dimensional fused features, a prediction model is developed to fully utilize these features as well as its corresponding hidden knowledge and then to forecast the runtimes accurately for cloud workflow tasks. To verify the effectiveness and superiority of the proposed method, we conduct extensive experiments on a cluster dataset from a real cloud data center. The experimental results show that, our approach outperforms the existing algorithms and can be applied in big data-driven runtime prediction for workflow activities in the cloud.
Translated title of the contribution | Multi-dimensional Feature Fusion-based Runtime Prediction Approach for Cloud Workflow Tasks |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 67-78 |
Number of pages | 12 |
Journal | Zidonghua Xuebao/Acta Automatica Sinica |
Volume | 49 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 2023 |