joinTree: A novel join-oriented multivariate operator for spatio-temporal data management in Flink

Hangxu Ji, Gang Wu*, Yuhai Zhao, Shiye Wang, Guoren Wang, George Y. Yuan

*此作品的通讯作者

科研成果: 期刊稿件文章同行评审

1 引用 (Scopus)

摘要

In the era of intelligent Internet, the management and analysis of massive spatio-temporal data is one of the important links to realize intelligent applications and build smart cities, in which the interaction of multi-source data is the basis of realizing spatio-temporal data management and analysis. As an important carrier to achieve the interactive calculation of massive data, Flink provides the advanced Operator Join to facilitate user program development. In a Flink job with multi-source data connection operations, the selection of join sequences and the data communication in the repartition phase are both key factors that affect the efficiency of the job. However, Flink does not provide any optimization mechanism for the two factors, which in turn leads to low job efficiency. If the enumeration method is used to find the optimal join sequence, the result will not be obtained in polynomial time, so the optimization effect cannot be achieved. We investigate the above problems, design and implement a more advanced Operator joinTree that can support multi-source data connection in Flink, and introduce two optimization strategies into the Operator. In summary, the advantages of our work are highlighted as follows: (1) the Operator enables Flink to support multi-source data connection operation, and reduces the amount of calculation and data communication by introducing lightweight optimization strategies to improve job efficiency; (2) with the optimization strategy for join sequence, the total running time can be reduced by 29% and the data communication can be reduced by 34% compared with traditional sequential execution; (3) the optimization strategy for data repartition can further enable the job to bring 35% performance improvement, and in the average case can reduce the data communication by 43%.

源语言英语
页(从-至)107-132
页数26
期刊GeoInformatica
27
1
DOI
出版状态已出版 - 1月 2023

指纹

探究 'joinTree: A novel join-oriented multivariate operator for spatio-temporal data management in Flink' 的科研主题。它们共同构成独一无二的指纹。

引用此