Stage-Wise SOH Prediction Using an Improved Random Forest Regression Algorithm
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In complex energy storage operating scenarios, batteries seldom undergo complete charge–discharge cycles required for periodic capacity calibration. Methods based on accelerated aging experiments can indicate possible aging paths; however, due to uncertainties like changing operating conditions, environmental variations, and manufacturing inconsistencies, the degradation information obtained from such experiments may not be applicable to the entire lifecycle. To address this, we developed a stage-wise state-of-health (SOH) prediction approach that combined offline training with online updating. During the offline training phase, multiple single-cell experiments were conducted under various combinations of depth of discharge (DOD) and C-rate. Multi-dimensional health features (HFs) were extracted, and an accelerated aging probability pAA was defined. Based on the correlation statistics between HFs, kHF, the SOH, and pAA, all cells in the dataset were divided into general early, middle, and late aging stages. For each stage, cells were further classified by their longevity (long, medium, and short), and multiple models were trained offline for each category. The results show that models trained on cells following similar aging paths achieve significantly better performance than a model trained on all data combined. Meanwhile, HF optimization was performed via a three-step process: an initial screening based on expert knowledge, a second screening using Spearman correlation coefficients, and an automatic feature importance ranking using a random forest regression (RFR) model. The proposed method is innovative in the following ways: (1) The stage-wise multi-model strategy significantly improves the SOH prediction accuracy across the entire lifecycle, maintaining the mean absolute percentage error (MAPE) within 1%. (2) The improved model provides uncertainty quantification, issuing a warning signal at least 50 cycles before the onset of accelerated aging. (3) The analysis of feature importance from the model outputs allows the indirect identification of the primary aging mechanisms at different stages. (4) The model is robust against missing or low-quality HFs. If certain features cannot be obtained or are of poor quality, the prediction process does not fail.