TY - GEN
T1 - Analysis of Indexing Structures for Immutable Data
AU - Yue, Cong
AU - Xie, Zhongle
AU - Zhang, Meihui
AU - Chen, Gang
AU - Ooi, Beng Chin
AU - Wang, Sheng
AU - Xiao, Xiaokui
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/6/14
Y1 - 2020/6/14
N2 - In emerging applications such as blockchains and collaborative data analytics, there are strong demands for data immutability, multi-version accesses, and tamper-evident controls. To provide efficient support for lookup and merge operations, three new index structures for immutable data, namely Merkle Patricia Trie (MPT), Merkle Bucket Tree(MBT), and Pattern-Oriented-Split Tree (POS-Tree), have been proposed. Although these structures have been adopted in real applications, there is no systematic evaluation of their pros and cons in the literature, making it difficult for practitioners to choose the right index structure for their applications. To alleviate the above problem, we present a comprehensive analysis of the existing index structures for immutable data, and evaluate both their asymptotic and empirical performance. Specifically, we show that MPT, MBT, and POS-Tree are all instances of a recently proposed framework, dubbed Structurally Invariant and Reusable Indexes (SIRI). We propose to evaluate the SIRI instances on their index performance and deduplication capability. We establish the worst-case guarantees of each index, and experimentally evaluate all indexes in a wide variety of settings. Based on our theoretical and empirical analysis, we conclude that POS-Tree is a favorable choice for indexing immutable data.
AB - In emerging applications such as blockchains and collaborative data analytics, there are strong demands for data immutability, multi-version accesses, and tamper-evident controls. To provide efficient support for lookup and merge operations, three new index structures for immutable data, namely Merkle Patricia Trie (MPT), Merkle Bucket Tree(MBT), and Pattern-Oriented-Split Tree (POS-Tree), have been proposed. Although these structures have been adopted in real applications, there is no systematic evaluation of their pros and cons in the literature, making it difficult for practitioners to choose the right index structure for their applications. To alleviate the above problem, we present a comprehensive analysis of the existing index structures for immutable data, and evaluate both their asymptotic and empirical performance. Specifically, we show that MPT, MBT, and POS-Tree are all instances of a recently proposed framework, dubbed Structurally Invariant and Reusable Indexes (SIRI). We propose to evaluate the SIRI instances on their index performance and deduplication capability. We establish the worst-case guarantees of each index, and experimentally evaluate all indexes in a wide variety of settings. Based on our theoretical and empirical analysis, we conclude that POS-Tree is a favorable choice for indexing immutable data.
KW - deduplication
KW - immutable data
KW - indexing
KW - versioning
UR - http://www.scopus.com/inward/record.url?scp=85086255938&partnerID=8YFLogxK
U2 - 10.1145/3318464.3389773
DO - 10.1145/3318464.3389773
M3 - Conference contribution
AN - SCOPUS:85086255938
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
SP - 925
EP - 935
BT - SIGMOD 2020 - Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
PB - Association for Computing Machinery
T2 - 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD 2020
Y2 - 14 June 2020 through 19 June 2020
ER -