A Tree-Based Indexing Approach for Diverse Textual Similarity Search

Minghe Yu*, Chengliang Chai, Ge Yu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)
Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 4
  • Captures
    • Readers: 7
see details

Abstract

Textual information is ubiquitous in our lives and is becoming an important component of our cognitive society. In the age of big data, we consistently need to traverse substantial amounts of data even to find a little information. To quickly acquire effective information, it is necessary to implement a textual similarity search based on an appropriate index structure to efficiently find results. In this article, we study top-k textual similarity search and develop a tree-based indexing approach that can construct indices to support various similarity functions. Our indexing approach clusters similar records in the same branch offline to improve the performance of online search. Based on the index tree, we present a top-k search algorithm with efficient pruning techniques. The experimental results demonstrate that our algorithm can achieve higher performance and better scalability than the baseline method.

Original languageEnglish
Article number9187345
Pages (from-to)8866-8876
Number of pages11
JournalIEEE Access
Volume9
DOIs
Publication statusPublished - 2021
Externally publishedYes

Keywords

  • Tree-based indexing
  • textual similarity
  • top-k similarity search

Fingerprint

Dive into the research topics of 'A Tree-Based Indexing Approach for Diverse Textual Similarity Search'. Together they form a unique fingerprint.

Cite this

Yu, M., Chai, C., & Yu, G. (2021). A Tree-Based Indexing Approach for Diverse Textual Similarity Search. IEEE Access, 9, 8866-8876. Article 9187345. https://doi.org/10.1109/ACCESS.2020.3022057