A top-k spatial join querying processing algorithm based on spark

Baiyou Qiao*, Bing Hu, Junhai Zhu, Gang Wu, Christophe Giraud-Carrier, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Aiming at the problem of top-k spatial join query processing in cloud computing systems, a Spark-based top-k spatial join (STKSJ) query processing algorithm is proposed. In this algorithm, the whole data space is divided into grid cells of the same size by a grid partitioning method, and each spatial object in one data set is projected into a grid cell. The Minimum Bounding Rectangle (MBR) of all spatial objects in each grid cell is computed. The spatial objects overlapping with these MBRs in another spatial data set are replicated to the corresponding grid cells, thereby filtering out spatial objects for which there are no join results, thus reducing the cost of subsequent spatial join processing. An improved plane sweeping algorithm is also proposed that speeds up the scanning mode and applies threshold filtering, thus greatly reducing the communication and computation costs of intermediate join results in subsequent top-k aggregation operations. Experimental results on synthetic and real data sets show that the proposed algorithm has clear advantages, and better performance than existing top-k spatial join query processing algorithms.

Original languageEnglish
Article number101419
JournalInformation Systems
Volume87
DOIs
Publication statusPublished - Jan 2020
Externally publishedYes

Keywords

  • Cloud computing
  • Plane sweeping algorithm
  • Spark platform
  • Top-k spatial join query

Fingerprint

Dive into the research topics of 'A top-k spatial join querying processing algorithm based on spark'. Together they form a unique fingerprint.

Cite this