Multi-Class Liver Disease Classification Using a Hybrid Deep Learning Framework Based on YOLO and CaiT Transformer Architectures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Doctors must tell liver disease patterns apart fast but reading scans by eye often drifts with opinion and the classes look. This work builds a two part deep-learning tool - it first pulls local shapes through the YOLOv8m backbone then lets a Class Attention Image Transformer (CaiT) see the whole picture. Validation and test data came from Roboflow: 3 976 tagged liver images that show four structural lesions - steatosis, ballooning, inflammation, fibrosis - split 70%, 20%, 10%. The pipeline keeps YOLOv8m as the visual encoder, feeds its feature tensor through an adapter so the Transformer can ingest it, sends the result to a CaiT-XS24-384 block for context vectors plus ends with a dense layer that outputs the class. We measured how well the model worked - computing accuracy, balanced accuracy, precision, recall, the F1-score, Cohen's kappa and the confusion matrix. On the separate test set, the hybrid model reached 95.00% accuracy, 95.03% balanced accuracy, 95.25% precision, 95.03% recall, 95.00% F1-score and 93.33% Cohen's kappa. Those numbers show high agreement between predicted plus true labels and stable separation among disease classes. When we compared methods, combining convolutional locality with Transformer based class-attention delivered solid performance but also balanced semantic detail against classification reliability for liver disease images.