Cognitive erasure-coded data update and repairfor mitigating I/O overhead
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In erasure-coded storage systems, updating data necessitates parity updates to maintain data consistency, which leads to I/O amplification due to "write-after-read" operations. Additionally, the scattered storage of parity updates imposes significant disk seek overhead during data repair. To address these challenges, this paper proposes a Cognitive Update and Repair Method (CURM), which uses machine learning to classify files into write-only, read-only, and read-write categories, enabling customized update and repair strategies. For write-only and read-write files, CURM utilizes data difference and fine-grained I/O scheduling to reduce I/O overhead. Furthermore, CURM reserves disk space adjacent to parity chunks for read-write files, enabling efficient parallel reads and minimizing seek cost during repair. We implement CURM in a prototype storage system and evaluate its performance using real-world NFS and MSR workloads on a 25-node cluster. Experimental results show that CURM improves data update throughput by up to 82.52% and reduces data recovery time by up to 47.47%, while achieving lower storage overhead compared to state-of-the-art approaches including FL, PL, PLR, and PARIX. These results demonstrate CURM’s effectiveness in enhancing both update and recovery performance for large-scale erasure-coded storage systems.