TY - GEN
T1 - Tier-scrubbing
T2 - 57th ACM/IEEE Design Automation Conference, DAC 2020
AU - Zhang, Ji
AU - Wang, Yuanzhang
AU - Wang, Yangtao
AU - Zhou, Ke
AU - Sebastian, Schelter
AU - Huang, Ping
AU - Cheng, Bin
AU - Ji, Yongguang
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Sector errors are a common type of error in modern disks. A sector error that occurs during I/O operations might cause inaccessibility of an application. Even worse, it could result in permanent data loss if the data is being reconstructed, and thereby severely affects the reliability of a storage system. Many disk scrubbing schemes have been proposed to solve this problem. However, existing approaches have several limitations. First, schemes use machine learning (ML) to predict latent sector errors (LSEs), but only leverage a single snapshot of training data to make a prediction, and thereby ignore sequential dependencies between different statuses of a hard disk over time. Second, they accelerate the scrubbing at a fixed rate based on the results of a binary classification model, which may result in unnecessary increases in scrubbing cost. Third, they naively accelerate the scrubbing of the full disk which has LSEs based on the predictive results, but neglect partial high-risk areas (the areas that have a higher probability of encountering LSEs). Lastly, they do not employ strategies to scrub these high-risk areas in advance based on I/O accesses patterns, in order to further increase the efficiency of scrubbing.We address these challenges by designing a Tier-Scrubbing (TS) scheme that combines a Long Short-Term Memory (LSTM) based Adaptive Scrubbing Rate Controller (ASRC), a module focusing on sector error locality to locate high-risk areas in a disk, and a piggyback scrubbing strategy to improve the reliability of a storage system. Our evaluation results on realistic datasets and workloads from two real world data centers demonstrate that TS can simultaneously decrease the Mean-Time-To-Detection (MTTD) by about 80% and the scrubbing cost by 20%, compared to a state-of-the-art scrubbing scheme.
AB - Sector errors are a common type of error in modern disks. A sector error that occurs during I/O operations might cause inaccessibility of an application. Even worse, it could result in permanent data loss if the data is being reconstructed, and thereby severely affects the reliability of a storage system. Many disk scrubbing schemes have been proposed to solve this problem. However, existing approaches have several limitations. First, schemes use machine learning (ML) to predict latent sector errors (LSEs), but only leverage a single snapshot of training data to make a prediction, and thereby ignore sequential dependencies between different statuses of a hard disk over time. Second, they accelerate the scrubbing at a fixed rate based on the results of a binary classification model, which may result in unnecessary increases in scrubbing cost. Third, they naively accelerate the scrubbing of the full disk which has LSEs based on the predictive results, but neglect partial high-risk areas (the areas that have a higher probability of encountering LSEs). Lastly, they do not employ strategies to scrub these high-risk areas in advance based on I/O accesses patterns, in order to further increase the efficiency of scrubbing.We address these challenges by designing a Tier-Scrubbing (TS) scheme that combines a Long Short-Term Memory (LSTM) based Adaptive Scrubbing Rate Controller (ASRC), a module focusing on sector error locality to locate high-risk areas in a disk, and a piggyback scrubbing strategy to improve the reliability of a storage system. Our evaluation results on realistic datasets and workloads from two real world data centers demonstrate that TS can simultaneously decrease the Mean-Time-To-Detection (MTTD) by about 80% and the scrubbing cost by 20%, compared to a state-of-the-art scrubbing scheme.
UR - http://www.scopus.com/inward/record.url?scp=85085232431&partnerID=8YFLogxK
U2 - 10.1109/DAC18072.2020.9218551
DO - 10.1109/DAC18072.2020.9218551
M3 - Conference contribution
AN - SCOPUS:85085232431
T3 - Proceedings - Design Automation Conference
BT - 2020 57th ACM/IEEE Design Automation Conference, DAC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 July 2020 through 24 July 2020
ER -