Hybrid Deep Learning for Fail Slow Disk Detection in the FSA Benchmark

Joshua Ludolf

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Fail-slow disks, where performance degrades gradually before an outright failure, are increasingly common in large-scale cloud storage systems. Our work builds upon the FSA-benchmark dataset (PERSEUS), which contains approximately 100 billion data points collected from over 300,000 disks across 25 clusters. Initial experiments with traditional machine learning models such as XGBoost, Random Forest, and shallow time-series methods like LSTM and SVM have shown moderate success in detecting fail-slow conditions (failure rates ranging from 3.33% for Autoencoder to 96.67% for SVM). However, these approaches struggle to capture the complex, high-frequency correlations in disk metrics that precede a fail-slow event. This research proposes a hybrid deep learning framework that combines convolutional-recurrent layers with self-attention mechanisms to better model both spatial and temporal dependencies in the 15-second-interval performance metrics. The proposed architecture ingests multivariate time windows (look back periods of 1-15 days) and outputs real-time probabilities of impending fail-slow conditions. We evaluate our approach on the same Cluster A and B splits used in the original PERSEUS study, using precision, recall, AUC-ROC, and Time-to-Alert as key metrics. Preliminary experiments demonstrate promising results, with the LSTM model achieving a 28% failure rate and the Autoencoder showing exceptional specificity (3.33% failure rate). The proposed hybrid architecture builds upon these foundations by integrating transformer-based mechanisms to better capture long-range dependencies in disk performance data.

Version published to 10.31224/6177
Jan 5, 2026

MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

This article has 6 authors:
1. Tosin Ige
2. Christopher Kiekintveld
3. Aritran Piplai
4. Asif Rahman
5. Olukunle Kolade
6. Sasidhar Kunapuli
This article has no evaluationsLatest version Dec 22, 2025
Real-Time Quantized YOLO Object Detection on Serverless Cloud Functions:An Experimental and Analytical Study

This article has 3 authors:
1. MD RAKIBUL HASAN
2. Sougato Biswas
3. Al Momin Sayef Efti
This article has no evaluationsLatest version Jan 6, 2026
Balanced-Reservoir Replay: A Simple Class-Aware Modification for Continual Learning in Highly Imbalanced Settings

This article has 1 author:
1. Samane Saghafi
This article has no evaluationsLatest version Dec 9, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification

Real-Time Quantized YOLO Object Detection on Serverless Cloud Functions:An Experimental and Analytical Study

Balanced-Reservoir Replay: A Simple Class-Aware Modification for Continual Learning in Highly Imbalanced Settings