一种分布式异构带宽环境下的高效数据分区方法

Translated title of the contribution: An Efficient Data Partitioning Method in Distributed Heterogeneous Bandwidth Environment

Qingyun Ma, Hangxu Ji, Yuhai Zhao*, Keming Mao, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

A large quantity of data is transmitted through the network during the process in distributed big data processing framework, resulting in the time consumption for data transmission between each node becomes one of the main costs of the operation. However, in the case of heterogeneous bandwidth of nodes, traditional data partitioning methods such as Hash partitioning or range partitioning will be inefficient, due to the existence of bandwidth bottleneck nodes. Data partitioning is necessary for big data processing and inefficient data partitioning methods would significantly increase the running time of jobs. We therefore propose a data transmission model between nodes to reduce time consumption in distributed heterogeneous bandwidth networks. The model calculates each node's optimal data distribution ratio to minimize the data transfer time, according to its uplink and downlink bandwidth as well as the initial data size. Besides, a bandwidth-based data partitioning method is designed based on the proposed model, enabling each node to allocate data under the optimal data distribution ratio. We demonstrate the effectiveness of our bandwidth-based data partitioning method through the implementation in the Apache Flink framework and have significantly improved efficiency. Extensive experimental results show that the bandwidth-based data partitioning method can effectively reduce the time consumption of data partitioning in distributed heterogeneous bandwidth conditions.

Translated title of the contributionAn Efficient Data Partitioning Method in Distributed Heterogeneous Bandwidth Environment
Original languageChinese (Traditional)
Pages (from-to)2683-2693
Number of pages11
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Volume57
Issue number12
DOIs
Publication statusPublished - Dec 2020

Fingerprint

Dive into the research topics of 'An Efficient Data Partitioning Method in Distributed Heterogeneous Bandwidth Environment'. Together they form a unique fingerprint.

Cite this