A Data-aware Learned Index Scheme for Efficient Writes

Li Liu, Chunhua Li*, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.

Original languageEnglish
Title of host publication51st International Conference on Parallel Processing, ICPP 2022 - Main Conference Proceedings
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450397339
DOIs
Publication statusPublished - 29 Aug 2022
Externally publishedYes
Event51st International Conference on Parallel Processing, ICPP 2022 - Virtual, Online, France
Duration: 29 Aug 20221 Sept 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference51st International Conference on Parallel Processing, ICPP 2022
Country/TerritoryFrance
CityVirtual, Online
Period29/08/221/09/22

Keywords

  • Data Partition
  • Data-aware Model
  • Index
  • Updating

Fingerprint

Dive into the research topics of 'A Data-aware Learned Index Scheme for Efficient Writes'. Together they form a unique fingerprint.

Cite this