HDFF-Net: A Hybrid Dual-Feature Fusion Network with Cross-Modal Attention for Automated Colposcopic Transformation Zone Classification

B. Shubhaker¹
B. S. Raghavendra²

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Cervical cancer screening through colposcopy depends critically on accurate classification of the transformation zone (TZ) according to the 2011 IFCPC nomenclature, as TZ type directly governs treatment eligibility (ablative versus excisional). Inter-rater agreement among trained colposcopists for TZ-type assignment is only κ ≈ 0.42–0.60, well below the κ ≥ 0.80 threshold considered reliable for clinical decision-making. Existing automated methods rely either on handcrafted feature pipelines or deep transfer learning in isolation, each with well-documented limitations on small clinical datasets. Methods We present HDFF-Net (Hybrid Dual-Feature Fusion Network), a novel dual-stream deep learning architecture that unifies multi-scale handcrafted texture descriptors with an EfficientNetB0 convolutional backbone through a bidirectional Cross-Modal Attention (CMA) fusion module. The handcrafted branch computes a 24,508-dimensional composite descriptor (MS-GLCM, MS-LBP, MS-HOG, extended Gabor bank, first-order statistical features) and applies a novel 1D Squeeze-and-Excitation (SE) attention module for learned feature recalibration. The CNN branch augments EfficientNetB0 with CBAM channel attention. Training incorporates SMOTEENN combined resampling, label-smoothing cross-entropy (ε = 0.1), AdamW optimisation, and three-stage progressive fine-tuning. A 2-level stacked ensemble (HDFF-Net + SVM + XGBoost + RF + LR meta-learner via 5-fold out-of-fold stacking) is additionally proposed. All experiments were conducted on 366 acetic-acid colposcopic images (TZ1:TZ2:TZ3 = 232:61:73) with a held-out 30% test partition. Results HDFF-Net achieves 99.01% test accuracy, macro-F1 of 98.94%, and macro-AUC of 0.9985 (DeLong 95% CI: [0.9971–0.9998]) on the held-out test set (n = 209). The stacked ensemble achieves 99.28% accuracy and macro-AUC of 0.9987 (CI: [0.9974–0.9999]). Both results are statistically superior to the best prior baseline (SVM: 97.13%; McNemar χ² = 18.7, p < 0.001, Bonferroni-corrected). Ablation analysis identifies CMA fusion as the single largest contributor (+ 1.34 percentage points over SE-branch-only SVM), confirming the complementarity of handcrafted texture and CNN spatial representations. Conclusions HDFF-Net establishes a new state-of-the-art for three-class IFCPC TZ-type classification, achieving an error rate (0.99%) below documented human expert disagreement rates. The GPU-optional SE+handcrafted branch (98.56% accuracy, CPU-only) is particularly relevant for deployment in low-resource clinical settings. The architecture, training strategy, and stacking framework are generalisable to other colposcopy classification tasks.

Version published to 10.21203/rs.3.rs-9260639/v1 on Research Square
Apr 7, 2026

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

This article has 1 author:
1. Khaled Wael Ezzat
This article has no evaluationsLatest version Apr 8, 2026
Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models

This article has 5 authors:
1. Shahriar Sultan. Ramit
2. Alaya Parven. Alo
3. Md. Sadekur Rahman
4. Masud Rana Rashel
5. A. K.M. Kamrul Islam
This article has no evaluationsLatest version Apr 10, 2026
RNNet-MST: A ResNet-50 with Multi-Scale Transformer Blocks for Pulmonary Nodule Classification and Attention-Based Localization on Chest X-Ray Images

This article has 9 authors:
1. Edrill F. Bilan
2. Emman T. Manduriaga
3. Hernando S. Salapare III
4. Ymir M. Garcia
5. Khatalyn E. Mata
6. Rose Anna R. Banal
7. Imelda C. Ang
8. Wei-Ta Chu
9. Dan Michael A. Cortez
This article has no evaluationsLatest version May 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models

RNNet-MST: A ResNet-50 with Multi-Scale Transformer Blocks for Pulmonary Nodule Classification and Attention-Based Localization on Chest X-Ray Images