SIMD-based inverted index compression algorithms

Hongfei Yan, Xudong Zhang, Dongdong Shan, Xianling Mao, Xin Zhao

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The rapid growth of text information has brought about new challenges to traditional information retrieval. In large search engines, indexing is required to help users acquire important data they need, and techniques of inverted index have great influence on the efficiency of query processing in such systems. The data in inverted index is stored in the form of arrays of integers, and techniques of compression are required to reduce the cost of storing such data in disks and memory, as well as to boost the hit rate of CPU cache and speed up transferring data. Therefore, it is necessary to choose a highly efficient compression algorithm to process query effectively. In this paper, we propose two instruction-level-parallelized algorithms, i.e. SIMD-PB and SIMD-PFD, which improve two competitive compression algorithms respectively, i.e. PackedBinary and PForDelta, and exploit SIMD instructions to accelerate the Pack and Unpack procedure in the algorithms. Experiments based on public datasets of GOV2 and ClueWeb09B show that our novel algorithms have good performance on encoding and decoding speed without impairing the compression ratio, and outperform the former fastest inverted list compression algorithms by at most 17%, with respect to decompression speed. Furthermore, experiments indicate that our novel algorithms have better performance on longer posting list and larger block size w.r.t. decoding speed.

Original languageEnglish
Pages (from-to)995-1004
Number of pages10
JournalJisuanji Yanjiu yu Fazhan/Computer Research and Development
Volume52
Issue number5
DOIs
Publication statusPublished - 1 May 2015

Keywords

  • Compression
  • Information retrieval
  • Integer encoding
  • Inverted index
  • Single instruction multiple data(SIMD)

Fingerprint

Dive into the research topics of 'SIMD-based inverted index compression algorithms'. Together they form a unique fingerprint.

Cite this