Continuous similarity join over geo-textual data streams

Hongwei Liu*, Yongjiao Sun*, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Geo-textual similarity join is a fundamental operation in spatial databases. With the continued proliferation of location-based social media, geo-textual data is becoming increasingly available over the past decades. Mobile users and location-based service providers may want to receive up-to-date similarity join results of massive-scale geo-textual objects over data streams. In this light, we propose and study a novel problem of Continuous Geo-Textual Similarity Join (CGTS-Join). Specifically, given a collection of geo-textual objects Q and a dynamic set of geo-textual objects P over geo-textual data streams, the problem CGTS-Join is to continuously maintain an up-to-date join result set containing object pairs such that the objects of each pair are similar to each other. For the purpose, we define an effective similarity metric that measures the similarity between two geo-textual objects by taking spatial, textual, and temporal aspects into consideration. Based on the similarity metric, we develop a Hybrid Grid Indexing Structure (HGI) and a tri-filtering framework that is capable of answering the CGTS-Join problem efficiently. We conduct extensive experiments on two real-world datasets to confirm the performance superiority of our proposed method.

Original languageEnglish
Pages (from-to)933-947
Number of pages15
JournalWorld Wide Web
Volume26
Issue number3
DOIs
Publication statusPublished - May 2023

Keywords

  • Geo-textual
  • Keyword
  • Spatial
  • Stream

Fingerprint

Dive into the research topics of 'Continuous similarity join over geo-textual data streams'. Together they form a unique fingerprint.

Cite this