TY - GEN
T1 - A Data-aware Learned Index Scheme for Efficient Writes
AU - Liu, Li
AU - Li, Chunhua
AU - Zhang, Zhou
AU - Liu, Yuhan
AU - Zhou, Ke
AU - Zhang, Ji
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/8/29
Y1 - 2022/8/29
N2 - Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.
AB - Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.
KW - Data Partition
KW - Data-aware Model
KW - Index
KW - Updating
UR - http://www.scopus.com/inward/record.url?scp=85148626441&partnerID=8YFLogxK
U2 - 10.1145/3545008.3545077
DO - 10.1145/3545008.3545077
M3 - Conference contribution
AN - SCOPUS:85148626441
T3 - ACM International Conference Proceeding Series
BT - 51st International Conference on Parallel Processing, ICPP 2022 - Main Conference Proceedings
PB - Association for Computing Machinery
T2 - 51st International Conference on Parallel Processing, ICPP 2022
Y2 - 29 August 2022 through 1 September 2022
ER -