TY - JOUR
T1 - A Multi-Way Spatial Join Querying Processing Algorithm Based on Spark
AU - Qiao, Baiyou
AU - Zhu, Junhai
AU - Zheng, Yujie
AU - Shen, Muchuan
AU - Wang, Guoren
N1 - Publisher Copyright:
© 2017, Science Press. All right reserved.
PY - 2017/7/1
Y1 - 2017/7/1
N2 - Aiming at the problem of spatial join query processing in cloud computing systems, a multi-way spatial join query processing algorithm BSMWSJ is proposed, which is based on Spark platform. In this algorithm, the whole data space is divided into grid cells with the same size by grid partition method, and spatial objects in each type data set are distributed into these grid cells according to their spatial locations. Spatial objects in different grid cells are processed in parallel. In multi-way spatial join query processing, a boundary filtering method is proposed to filter the useless data, which calculates the MBRs of the candidate results generated by the previous join processing, and uses these MBRs to filter the subsequent join data sets. This allows it to filter out the useless spatial objects, and reduce the redundant projection and replication of spatial objects. At the same time, a duplication avoidance strategy is applied to reduce the outputs of redundant results, and further minimizes the cost of the subsequent join processing. Many experiments on synthetic and real data sets show that the proposed multi-way spatial join query processing algorithm BSMWSJ has obvious advantages and better performance than the existing multi-way spatial join query processing algorithms.
AB - Aiming at the problem of spatial join query processing in cloud computing systems, a multi-way spatial join query processing algorithm BSMWSJ is proposed, which is based on Spark platform. In this algorithm, the whole data space is divided into grid cells with the same size by grid partition method, and spatial objects in each type data set are distributed into these grid cells according to their spatial locations. Spatial objects in different grid cells are processed in parallel. In multi-way spatial join query processing, a boundary filtering method is proposed to filter the useless data, which calculates the MBRs of the candidate results generated by the previous join processing, and uses these MBRs to filter the subsequent join data sets. This allows it to filter out the useless spatial objects, and reduce the redundant projection and replication of spatial objects. At the same time, a duplication avoidance strategy is applied to reduce the outputs of redundant results, and further minimizes the cost of the subsequent join processing. Many experiments on synthetic and real data sets show that the proposed multi-way spatial join query processing algorithm BSMWSJ has obvious advantages and better performance than the existing multi-way spatial join query processing algorithms.
KW - Boundary filtering
KW - Cloud computing
KW - Duplication avoidance
KW - Multi-way spatial join query
KW - Spark platform
UR - https://www.scopus.com/pages/publications/85031089151
U2 - 10.7544/issn1000-1239.2017.20160558
DO - 10.7544/issn1000-1239.2017.20160558
M3 - Article
AN - SCOPUS:85031089151
SN - 1000-1239
VL - 54
SP - 1592
EP - 1602
JO - Jisuanji Yanjiu yu Fazhan/Computer Research and Development
JF - Jisuanji Yanjiu yu Fazhan/Computer Research and Development
IS - 7
ER -