LLM-Enhanced Intelligent Fault Diagnosis and Self-Healing Framework for Cloud Computing Systems

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Existing methods for fault detection in cloud and quantum systems are powerful but brittle. They struggle with unknown failures, rely on inflexible recovery playbooks, and use fixed quantum error correction (QEC) schemes, a significant problem in diverse multi-cloud settings. To overcome these issues, we introduce \textbf{Intelligent Multi-Cloud Fault Detection with Adaptive Quantum Error Correction}. Our framework is built on three pillars: hierarchical multi-agent learning, adaptive multi-cloud execution, and predictive QEC. Specialized agents learn from experience, while the system adapts to real-time cloud performance and quantum error states. How effective is this approach? Testing on the CloudSim Fault Injection Dataset, Multi-Cloud Performance Benchmark, and IBM Quantum Error Logs shows its real-world impact. We achieved 94.2\% detection accuracy, cutting false positives by 68\%. System availability jumped from 85\% to 96.1\%, and recovery time plummeted from 340s to just 45s. For quantum workloads, the framework reached a 96.7\% success rate with 94.3\% state fidelity. This work offers a more robust and adaptive solution for fault management in today's complex hybrid cloud-quantum environments.

Article activity feed