Abstract
Geo-textual similarity join is a fundamental operation in spatial databases. With the continued proliferation of location-based social media, geo-textual data is becoming increasingly available over the past decades. Mobile users and location-based service providers may want to receive up-to-date similarity join results of massive-scale geo-textual objects over data streams. In this light, we propose and study a novel problem of Continuous Geo-Textual Similarity Join (CGTS-Join). Specifically, given a collection of geo-textual objects Q and a dynamic set of geo-textual objects P over geo-textual data streams, the problem CGTS-Join is to continuously maintain an up-to-date join result set containing object pairs such that the objects of each pair are similar to each other. For the purpose, we define an effective similarity metric that measures the similarity between two geo-textual objects by taking spatial, textual, and temporal aspects into consideration. Based on the similarity metric, we develop a Hybrid Grid Indexing Structure (HGI) and a tri-filtering framework that is capable of answering the CGTS-Join problem efficiently. We conduct extensive experiments on two real-world datasets to confirm the performance superiority of our proposed method.
Original language | English |
---|---|
Pages (from-to) | 933-947 |
Number of pages | 15 |
Journal | World Wide Web |
Volume | 26 |
Issue number | 3 |
DOIs | |
Publication status | Published - May 2023 |
Keywords
- Geo-textual
- Keyword
- Spatial
- Stream