A multimodal cross-attention pathotranscriptome integration for enhanced survival prediction of oral squamous cell carcinoma

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Oral squamous cell carcinoma (OSCC) accounts for a major part of cancer mortality, with survival outcomes highly dependent on early diagnosis. While many approaches have been proposed for OSCC survival prediction, they often rely on unimodal data, which may be suboptimal. In this study, we introduced a unified cross-attention-based deep learning framework that integrates whole-slide histopathology images (WSIs) and transcriptomic data from OSCC patients for survival prediction. The framework employed an autoencoder for transcriptomic feature extraction and a state-of-the-art pathology foundation model—evaluated across five alternatives—to derive WSI embeddings. These embeddings were subsequently integrated using cross-attention and concatenation within a Cox proportional hazards model. The multimodal approach outperformed nearly all unimodal counterparts, achieving a maximum concordance index of 0.780±0.059 with cross-attention and 0.766±0.050 with concatenation. The results indicate that pathotranscriptomic integration could improve survival prediction for OSCC patients. The implementation is available on GitHub at: https://github.com/kountaydwivedi/multimodal fusion.git .

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible .

Article activity feed