Abstract
Microservice is a popular architecture to construct applications from a set of small independent services in cloud environment, leading to high cohesion, high availability, low coupling, and decent scalability. Due to large number of independent services in a microservice system, system faults generated from a single service would propagate to multiple services, eventually degraded the overall system performance and Quality of Service (QoS). Thus, it is crucial to efficiently and autonomously diagnose the runtime system fault. However, the complexity and dynamism of microservice systems and cloud environment pose unique challenges to precisely and robustly identify the faults and localize the root causes. In this paper, we propose an Autonomous Model Selection-Ensemble-Stacking (AMSES) framework for microservice system fault identification. The proposed framework can automatically select, ensemble, and stack optimal models from candidate unsupervised detection models for identifying different fault types robustly. In addition, AMSES can adaptively localize the fault services using autoselected root cause localization model. Moreover, by exploiting the fault degree and causal inferring score, we can diagnose the detected system fault precisely and interpretably. To evaluate the effectiveness, we empirically compare AMSES with stateof-the-art models on three kinds of faults on two microservice benchmarks: Sock-Shop and Train-Ticket. The experimental results show that AMSES can achieve 87.1% and 91.4% macroF1 average for fault type identification on Sock-Shop and TrainTicket, respectively. Meanwhile, AMSES could outperform its competitors for root cause localization with an average Avg@5 of 0.856 on Sock-Shop and 0.633 on Train-Ticket.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE International Conference on Web Services, ICWS 2025 |
| Editors | Rong N. Chang, Carl K. Chang, Jingwei Yang, Nimanthi Atukorala, Dan Chen, Sumi Helal, Sasu Tarkoma, Qiang He, Tevfik Kosar, Claudio Agostino Ardagna, Amin Beheshti, Bo Cheng, Walid Gaaloul |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 814-824 |
| Number of pages | 11 |
| Edition | 2025 |
| ISBN (Electronic) | 9798331555634 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 2025 IEEE International Conference on Web Services, ICWS 2025 - Helsinki, Finland Duration: 7 Jul 2025 → 12 Jul 2025 |
Conference
| Conference | 2025 IEEE International Conference on Web Services, ICWS 2025 |
|---|---|
| Country/Territory | Finland |
| City | Helsinki |
| Period | 7/07/25 → 12/07/25 |
Keywords
- Autonomic System Fault Diagnosis
- Autonomous Model Construction
- Fault Identification
- Microservice Architecture
- Root Cause Localization