TY - JOUR
T1 - Optimization Method of Projection and Order for Multiple Tables Join
AU - Zong, Fengbo
AU - Zhao, Yuhai
AU - Wang, Guoren
AU - Ji, Hangxu
N1 - Publisher Copyright:
© 2022, Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - Multiple tables join operation is a common operation in big data processing. Similar to the common Join operations in database operations, the order of multiple tables join operation will have a great impact on the consumption of computing resources and transmission resources. The optimization of the join order of multiple tables is a classical optimization problem, and the size of the projection result of the table in each join will also affect the data volume transmitted between nodes. Therefore, the overall connection order and the projection relationship of each connection will have a significant impact on the join efficiency. But in the traditional optimization strategy, the choice of intermediate projection relation, and the influence on the optimal join strategy based on the intermediate projection relation are often not considered. In order to solve this problem, this paper establishes a connection relation index, which can adjust the projection relation of each join in the construction optimization connection strategy, delete redundant columns in time, and reduce the consumption of transmission resources. At the same time, the optimization strategy of adjusting join order based on projection relation can reduce the consumption of transmission resources and computing resources as much as possible. After the implementation in the Flink system, the optimization strategy is tested, and the results show that it has a significant optimization effect.
AB - Multiple tables join operation is a common operation in big data processing. Similar to the common Join operations in database operations, the order of multiple tables join operation will have a great impact on the consumption of computing resources and transmission resources. The optimization of the join order of multiple tables is a classical optimization problem, and the size of the projection result of the table in each join will also affect the data volume transmitted between nodes. Therefore, the overall connection order and the projection relationship of each connection will have a significant impact on the join efficiency. But in the traditional optimization strategy, the choice of intermediate projection relation, and the influence on the optimal join strategy based on the intermediate projection relation are often not considered. In order to solve this problem, this paper establishes a connection relation index, which can adjust the projection relation of each join in the construction optimization connection strategy, delete redundant columns in time, and reduce the consumption of transmission resources. At the same time, the optimization strategy of adjusting join order based on projection relation can reduce the consumption of transmission resources and computing resources as much as possible. After the implementation in the Flink system, the optimization strategy is tested, and the results show that it has a significant optimization effect.
KW - big data
KW - join optimization
KW - project optimization
UR - https://www.scopus.com/pages/publications/85131297348
U2 - 10.3778/j.issn.1673-9418.2009099
DO - 10.3778/j.issn.1673-9418.2009099
M3 - Article
AN - SCOPUS:85131297348
SN - 1673-9418
VL - 16
SP - 106
EP - 119
JO - Journal of Frontiers of Computer Science and Technology
JF - Journal of Frontiers of Computer Science and Technology
IS - 1
ER -