Joint Directory, File and IO Trace Feature Extraction and Feature-based Trace Regeneration for Enterprise Storage Systems

Kecheng Huang, Xijun Li, Mingxuan Yuan, Ji Zhang, Zili Shao

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

For enterprise storage systems, users' directory/file and IO access traces are critical for fine-tuning and new designs. However, once these systems are deployed, only trace features with small sizes are allowed to be sent back to vendors. Therefore, it is crucial to develop effective techniques for highly compressed feature extraction and feature-based high-fidelity trace regeneration. Existing works primarily focus on I/O trace modeling and regeneration without considering the directory/file access information. In this paper, we propose a new technique, called Sketcher, that can sketch massive traces into highly compressed 'joint features' with both directory/file and I/O characteristics, and then based on these features regenerate high-fidelity traces with a learning-based approach. For trace feature extraction, one key idea is to divide traces into multiple distance-associated segments, where each segment contains all files and IO accesses operating under the same directory and the differences between segments are represented as displacement of segment inside the directory tree. A dynamic weight scaling technique is proposed to further compress features considering feature criticality and the size quota, thereby achieving high compression ratios with critical characteristics (e.g., abnormal IO access patterns). For trace regeneration, a new learning-based RNN model is proposed to regenerate high-fidelity traces from extracted features based on sampling directory trees. We have implemented a fully functional prototype based on typical enterprise storage systems and evaluated Sketcher with real applications and benchmarks on Huawei OceanStor Dorado storage server. Results show that Sketcher can effectively extract features with marginal runtime overheads while achieving compression ratios up to 15.2K and regenerating high-fidelity traces.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024
PublisherIEEE Computer Society
Pages4002-4015
Number of pages14
ISBN (Electronic)9798350317152
DOIs
Publication statusPublished - 2024
Externally publishedYes
Event40th IEEE International Conference on Data Engineering, ICDE 2024 - Utrecht, Netherlands
Duration: 13 May 202417 May 2024

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627
ISSN (Electronic)2375-0286

Conference

Conference40th IEEE International Conference on Data Engineering, ICDE 2024
Country/TerritoryNetherlands
CityUtrecht
Period13/05/2417/05/24

Keywords

  • feature-based trace regeneration
  • trace feature extraction
  • trace-based evaluation

Fingerprint

Dive into the research topics of 'Joint Directory, File and IO Trace Feature Extraction and Feature-based Trace Regeneration for Enterprise Storage Systems'. Together they form a unique fingerprint.

Cite this