TY - GEN
T1 - Managing System Failure Risk
T2 - 9th International Symposium on System Security, Safety, and Reliability, ISSSR 2023
AU - Qiu, Qingan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The occurrence of failures in safety-critical systems can result in severe consequences, including loss of life and significant economic impact. Therefore, it is essential to establish effective risk control policies to enhance system survivability. While traditional approaches focus on preventive maintenance, which may be time-consuming and impractical during continuous mission execution, this research proposes an alternative approach. By leveraging the relationship between system performance levels and degradation behavior, opportunities arise for controlling system deterioration through dynamic performance adjustment. Mission abort is also explored as an intuitive way to mitigate safety hazards. To achieve flexible risk control during mission execution, this study dynamically adjusts performance levels and mission abort decisions based on the deterioration level and amount of remaining work. The problem is formulated within the framework of a Markov decision process, and optimal policies are derived by analyzing structural properties. Comparative evaluations of heuristic policies are conducted to provide insights, and it is demonstrated that optimal performance control and mission abort policies exhibit a threshold structure, dependent on the performance level and degradation process. The utilization of condition information for dynamic adjustments offers potential for reducing failure risks and operational costs in safety-critical systems.
AB - The occurrence of failures in safety-critical systems can result in severe consequences, including loss of life and significant economic impact. Therefore, it is essential to establish effective risk control policies to enhance system survivability. While traditional approaches focus on preventive maintenance, which may be time-consuming and impractical during continuous mission execution, this research proposes an alternative approach. By leveraging the relationship between system performance levels and degradation behavior, opportunities arise for controlling system deterioration through dynamic performance adjustment. Mission abort is also explored as an intuitive way to mitigate safety hazards. To achieve flexible risk control during mission execution, this study dynamically adjusts performance levels and mission abort decisions based on the deterioration level and amount of remaining work. The problem is formulated within the framework of a Markov decision process, and optimal policies are derived by analyzing structural properties. Comparative evaluations of heuristic policies are conducted to provide insights, and it is demonstrated that optimal performance control and mission abort policies exhibit a threshold structure, dependent on the performance level and degradation process. The utilization of condition information for dynamic adjustments offers potential for reducing failure risks and operational costs in safety-critical systems.
KW - condition monitoring
KW - mission abort
KW - mission reliability
KW - performance control
KW - system survivability
UR - http://www.scopus.com/inward/record.url?scp=85170028685&partnerID=8YFLogxK
U2 - 10.1109/ISSSR58837.2023.00018
DO - 10.1109/ISSSR58837.2023.00018
M3 - Conference contribution
AN - SCOPUS:85170028685
T3 - Proceedings - 2023 9th International Symposium on System Security, Safety, and Reliability, ISSSR 2023
SP - 55
EP - 61
BT - Proceedings - 2023 9th International Symposium on System Security, Safety, and Reliability, ISSSR 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 June 2023 through 11 June 2023
ER -