LLM-Enhanced Intelligent Fault Diagnosis and Self-Healing Framework for Cloud Computing Systems

Tailai Song
Wei Zhang
Sheng-Ning Lang
Hao Yan

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Existing methods for fault detection in cloud and quantum systems are powerful but brittle. They struggle with unknown failures, rely on inflexible recovery playbooks, and use fixed quantum error correction (QEC) schemes, a significant problem in diverse multi-cloud settings. To overcome these issues, we introduce \textbf{Intelligent Multi-Cloud Fault Detection with Adaptive Quantum Error Correction}. Our framework is built on three pillars: hierarchical multi-agent learning, adaptive multi-cloud execution, and predictive QEC. Specialized agents learn from experience, while the system adapts to real-time cloud performance and quantum error states. How effective is this approach? Testing on the CloudSim Fault Injection Dataset, Multi-Cloud Performance Benchmark, and IBM Quantum Error Logs shows its real-world impact. We achieved 94.2\% detection accuracy, cutting false positives by 68\%. System availability jumped from 85\% to 96.1\%, and recovery time plummeted from 340s to just 45s. For quantum workloads, the framework reached a 96.7\% success rate with 94.3\% state fidelity. This work offers a more robust and adaptive solution for fault management in today's complex hybrid cloud-quantum environments.

Version published to 10.20944/preprints202601.0630.v2
Jan 9, 2026
Version published to 10.20944/preprints202601.0630.v1
Jan 8, 2026

LLM-Enhanced Intelligent Fault Diagnosis and Self-Healing Framework for Cloud Computing Systems

This article has 4 authors:
1. Tailai Song
2. Wei Zhang
3. Sheng-Ning Lang
4. Hao Yan
This article has no evaluationsLatest version Jan 9, 2026
Multi-Agent Human-AI Systems with Low-Code Platforms Enabling Adaptive Web Services and Real-Time Anomaly Remediation in Distributed Architectures

This article has 1 author:
1. Nazmunisha N
This article has no evaluationsLatest version Jan 26, 2026
Deep Learning–based IDS framework for Cloud Data Security

This article has 2 authors:
1. Bhavna Gangwar
2. Nupa Ram Chauhan
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

LLM-Enhanced Intelligent Fault Diagnosis and Self-Healing Framework for Cloud Computing Systems

Multi-Agent Human-AI Systems with Low-Code Platforms Enabling Adaptive Web Services and Real-Time Anomaly Remediation in Distributed Architectures

Deep Learning–based IDS framework for Cloud Data Security