TY - GEN
T1 - Joint Directory, File and IO Trace Feature Extraction and Feature-based Trace Regeneration for Enterprise Storage Systems
AU - Huang, Kecheng
AU - Li, Xijun
AU - Yuan, Mingxuan
AU - Zhang, Ji
AU - Shao, Zili
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - For enterprise storage systems, users' directory/file and IO access traces are critical for fine-tuning and new designs. However, once these systems are deployed, only trace features with small sizes are allowed to be sent back to vendors. Therefore, it is crucial to develop effective techniques for highly compressed feature extraction and feature-based high-fidelity trace regeneration. Existing works primarily focus on I/O trace modeling and regeneration without considering the directory/file access information. In this paper, we propose a new technique, called Sketcher, that can sketch massive traces into highly compressed 'joint features' with both directory/file and I/O characteristics, and then based on these features regenerate high-fidelity traces with a learning-based approach. For trace feature extraction, one key idea is to divide traces into multiple distance-associated segments, where each segment contains all files and IO accesses operating under the same directory and the differences between segments are represented as displacement of segment inside the directory tree. A dynamic weight scaling technique is proposed to further compress features considering feature criticality and the size quota, thereby achieving high compression ratios with critical characteristics (e.g., abnormal IO access patterns). For trace regeneration, a new learning-based RNN model is proposed to regenerate high-fidelity traces from extracted features based on sampling directory trees. We have implemented a fully functional prototype based on typical enterprise storage systems and evaluated Sketcher with real applications and benchmarks on Huawei OceanStor Dorado storage server. Results show that Sketcher can effectively extract features with marginal runtime overheads while achieving compression ratios up to 15.2K and regenerating high-fidelity traces.
AB - For enterprise storage systems, users' directory/file and IO access traces are critical for fine-tuning and new designs. However, once these systems are deployed, only trace features with small sizes are allowed to be sent back to vendors. Therefore, it is crucial to develop effective techniques for highly compressed feature extraction and feature-based high-fidelity trace regeneration. Existing works primarily focus on I/O trace modeling and regeneration without considering the directory/file access information. In this paper, we propose a new technique, called Sketcher, that can sketch massive traces into highly compressed 'joint features' with both directory/file and I/O characteristics, and then based on these features regenerate high-fidelity traces with a learning-based approach. For trace feature extraction, one key idea is to divide traces into multiple distance-associated segments, where each segment contains all files and IO accesses operating under the same directory and the differences between segments are represented as displacement of segment inside the directory tree. A dynamic weight scaling technique is proposed to further compress features considering feature criticality and the size quota, thereby achieving high compression ratios with critical characteristics (e.g., abnormal IO access patterns). For trace regeneration, a new learning-based RNN model is proposed to regenerate high-fidelity traces from extracted features based on sampling directory trees. We have implemented a fully functional prototype based on typical enterprise storage systems and evaluated Sketcher with real applications and benchmarks on Huawei OceanStor Dorado storage server. Results show that Sketcher can effectively extract features with marginal runtime overheads while achieving compression ratios up to 15.2K and regenerating high-fidelity traces.
KW - feature-based trace regeneration
KW - trace feature extraction
KW - trace-based evaluation
UR - http://www.scopus.com/inward/record.url?scp=85200471496&partnerID=8YFLogxK
U2 - 10.1109/ICDE60146.2024.00307
DO - 10.1109/ICDE60146.2024.00307
M3 - Conference contribution
AN - SCOPUS:85200471496
T3 - Proceedings - International Conference on Data Engineering
SP - 4002
EP - 4015
BT - Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PB - IEEE Computer Society
T2 - 40th IEEE International Conference on Data Engineering, ICDE 2024
Y2 - 13 May 2024 through 17 May 2024
ER -