TY - GEN
T1 - Consistent snapshot algorithms for in-memory database systems
T2 - 34th IEEE International Conference on Data Engineering, ICDE 2018
AU - Li, Liang
AU - Wang, Guoren
AU - Wu, Gang
AU - Yuan, Ye
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/10/24
Y1 - 2018/10/24
N2 - In-memory databases (IMDBs) are gaining increasing popularity in big data applications, where clients commit updates intensively. Consistent snapshot is a key step in backup and recovery of IMDBs, thus an important factor for system performance of IMDBs. Formally, the in-memory consistent snapshot problem refers to taking an in-memory consistent time-in-point snapshot with the constraints that 1) clients can read the latest data items, and 2) any data item in the snapshot should not be overwritten. Various snapshot algorithms have been proposed in the academia to trade off throughput and latency, yet industrial IMDBs such as Redis still stick to the simple fork algorithm. As an understanding of this phenomenon, we conduct comprehensive performance evaluations on mainstream snapshot algorithms. Surprisingly, we observe that the simple fork algorithm indeed outperforms the state-of-The-Arts in update-intensive workload scenarios. On this basis, we identify the drawbacks of existing research and propose two lightweight improvements. Extensive evaluations on synthetic data and Redis show that our lightweight improvements yield better performance than fork, the current industrial standard, and the representative snapshot algorithms from the academia. Finally, we have opensourced the implementation of all the above snapshot algorithms to facilitate practitioners to benchmark the performance of each algorithm and select proper methods for different application scenarios.
AB - In-memory databases (IMDBs) are gaining increasing popularity in big data applications, where clients commit updates intensively. Consistent snapshot is a key step in backup and recovery of IMDBs, thus an important factor for system performance of IMDBs. Formally, the in-memory consistent snapshot problem refers to taking an in-memory consistent time-in-point snapshot with the constraints that 1) clients can read the latest data items, and 2) any data item in the snapshot should not be overwritten. Various snapshot algorithms have been proposed in the academia to trade off throughput and latency, yet industrial IMDBs such as Redis still stick to the simple fork algorithm. As an understanding of this phenomenon, we conduct comprehensive performance evaluations on mainstream snapshot algorithms. Surprisingly, we observe that the simple fork algorithm indeed outperforms the state-of-The-Arts in update-intensive workload scenarios. On this basis, we identify the drawbacks of existing research and propose two lightweight improvements. Extensive evaluations on synthetic data and Redis show that our lightweight improvements yield better performance than fork, the current industrial standard, and the representative snapshot algorithms from the academia. Finally, we have opensourced the implementation of all the above snapshot algorithms to facilitate practitioners to benchmark the performance of each algorithm and select proper methods for different application scenarios.
KW - Consistent Snapshot
KW - In Memory databases
UR - http://www.scopus.com/inward/record.url?scp=85057111178&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2018.00131
DO - 10.1109/ICDE.2018.00131
M3 - Conference contribution
AN - SCOPUS:85057111178
T3 - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
SP - 1264
EP - 1267
BT - Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 April 2018 through 19 April 2018
ER -