TY - JOUR
T1 - An optimized learning-based directory placement policy with two-rounds selection in distributed file systems
AU - Wang, Yuanzhang
AU - Yang, Fengkui
AU - Zhou, Ke
AU - Li, Chunhua
AU - Liu, Chong
AU - Zhang, Ji
AU - Cheng, Zhuo
N1 - Publisher Copyright:
© 2023
PY - 2024/5
Y1 - 2024/5
N2 - Load balancing is a critical problem in distributed file systems. Previous works focus on achieving data distribution across nodes at the file-level, often overlooking the potential benefits derived from exploiting the directory locality and the long duration of the directory hotness. This oversight may affect the balance and cause performance degradation. To overcome these shortcomings, in this paper, we propose an optimized learning-based directory placement policy with two-rounds selection named OLDP which determines the data layout by predicting the load. Specifically, we establish a relationship between directory request features and state information to predict the state information of the directory (storage capacity, bandwidth, and IOPS). Then, we propose a two-rounds selection multidimensional resource allocation policy in hybrid storage to place the directory. On the one hand, it combines the trade-off between the same category directory and the peer directory, on the other hand, it avoids overloading the nodes with fast devices. Extensive experiments demonstrate that OLDP not only efficiently alleviates load imbalance but also improves performance in practice. Specifically, in a hybrid storage system, service latency, IOPS, and bandwidth improvements are 16%, 26%, and 25% compared to the state-of-the-art method, respectively. In a practical all-flash storage system, OLDP reduces service latency by 36% and increases IOPS and bandwidth by 8% and 9%.
AB - Load balancing is a critical problem in distributed file systems. Previous works focus on achieving data distribution across nodes at the file-level, often overlooking the potential benefits derived from exploiting the directory locality and the long duration of the directory hotness. This oversight may affect the balance and cause performance degradation. To overcome these shortcomings, in this paper, we propose an optimized learning-based directory placement policy with two-rounds selection named OLDP which determines the data layout by predicting the load. Specifically, we establish a relationship between directory request features and state information to predict the state information of the directory (storage capacity, bandwidth, and IOPS). Then, we propose a two-rounds selection multidimensional resource allocation policy in hybrid storage to place the directory. On the one hand, it combines the trade-off between the same category directory and the peer directory, on the other hand, it avoids overloading the nodes with fast devices. Extensive experiments demonstrate that OLDP not only efficiently alleviates load imbalance but also improves performance in practice. Specifically, in a hybrid storage system, service latency, IOPS, and bandwidth improvements are 16%, 26%, and 25% compared to the state-of-the-art method, respectively. In a practical all-flash storage system, OLDP reduces service latency by 36% and increases IOPS and bandwidth by 8% and 9%.
KW - Data placement
KW - DFS
KW - Directory placement
KW - Load balance
KW - Load prediction
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85182393547&partnerID=8YFLogxK
U2 - 10.1016/j.future.2023.12.012
DO - 10.1016/j.future.2023.12.012
M3 - Article
AN - SCOPUS:85182393547
SN - 0167-739X
VL - 154
SP - 235
EP - 250
JO - Future Generation Computer Systems
JF - Future Generation Computer Systems
ER -