Comparative Analysis of CNN and LSTM for Bearing Fault Mode Classification and Causality Through Representation Analysis
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study investigates how the clarity of frequency-domain characteristics in vibration signals affects the performance of deep learning models for bearing fault classification. Two datasets were used; these were the CWRU benchmark dataset, which exhibits distinct and easily separable spectral signatures across fault modes, and a custom low-speed bearing dataset in which small defects do not significantly alter the frequency spectrum. To enable a clear and interpretable comparison, simplified CNN and LSTM architectures with a single core layer were deliberately employed. This design choice allows performance differences to be attributed directly to the inherent learning mechanisms of each architecture rather than to model complexity. Representation analysis shows that LSTM-F achieves the highest accuracy when the dataset contains clearly distinguishable spectral patterns, as in the CWRU case. In contrast, CNN-S outperforms both LSTM models in the experimental dataset, where fault-induced frequency characteristics are weak or ambiguous. Additional representation analyses further reveal that LSTM-F relies on consistent frequency-indexed patterns, whereas CNN-S captures more complex time–frequency interactions, making it more robust under low-separability conditions. These findings demonstrate that the optimal deep learning architecture for bearing fault classification depends on the degree of frequency separability in the data. LSTM-F is preferable for severe faults with distinct spectral features, while CNN-S is more effective for minor defects or systems exhibiting complex, weakly discriminative frequency behavior.