TY - GEN
T1 - Multidimensional Features Helping Predict Failures in Production SSD-Based Consumer Storage Systems
AU - Zhang, Xinyan
AU - Tan, Zhipeng
AU - Feng, Dan
AU - He, Qiang
AU - Ju, Wan
AU - Hao, Jiang
AU - Zhang, Ji
AU - Yang, Lihua
AU - Qi, Wenjie
N1 - Publisher Copyright:
© 2023 EDAA.
PY - 2023
Y1 - 2023
N2 - As SSD failures seriously lead to data loss and service interruption, proactive failure prediction is often used to improve system availability. However, the unidimensional SMART-based prediction models hardly predict all drive failures. Some other features applied in data centers and enterprise storage systems are not readily available in consumer storage systems (CSS). To further analyze related failures in production SSD-based CSS, we study nearly 2.3 million SSDs from 12 drive models based on a dataset of SMART logs, trouble tickets, and error logs. We discover that SMART, Firmware Version, WindowsEvent, and BlueScreenof Death (SFWB) are closely related to SSD failures. We further propose a multidimensional-based failure prediction approach (MFPA), which is portable in algorithms, SSD vendors, and PC manufacturers. Experiments on the datasets show that SFWB-based MFPA achieves a high true positive rate (98.18%) and low false positive rate (0.56%), which is 4% higher and 86% lower than the SMART-based model. It is robust and can con-tinuously predict for 2-3 months without iteration, substantially improving the system availability.
AB - As SSD failures seriously lead to data loss and service interruption, proactive failure prediction is often used to improve system availability. However, the unidimensional SMART-based prediction models hardly predict all drive failures. Some other features applied in data centers and enterprise storage systems are not readily available in consumer storage systems (CSS). To further analyze related failures in production SSD-based CSS, we study nearly 2.3 million SSDs from 12 drive models based on a dataset of SMART logs, trouble tickets, and error logs. We discover that SMART, Firmware Version, WindowsEvent, and BlueScreenof Death (SFWB) are closely related to SSD failures. We further propose a multidimensional-based failure prediction approach (MFPA), which is portable in algorithms, SSD vendors, and PC manufacturers. Experiments on the datasets show that SFWB-based MFPA achieves a high true positive rate (98.18%) and low false positive rate (0.56%), which is 4% higher and 86% lower than the SMART-based model. It is robust and can con-tinuously predict for 2-3 months without iteration, substantially improving the system availability.
KW - failure prediction
KW - machine learning
KW - multidimensional features
KW - SSD
KW - system availability
UR - http://www.scopus.com/inward/record.url?scp=85162624237&partnerID=8YFLogxK
U2 - 10.23919/DATE56975.2023.10137082
DO - 10.23919/DATE56975.2023.10137082
M3 - Conference contribution
AN - SCOPUS:85162624237
T3 - Proceedings -Design, Automation and Test in Europe, DATE
BT - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 Design, Automation and Test in Europe Conference and Exhibition, DATE 2023
Y2 - 17 April 2023 through 19 April 2023
ER -