A Comprehensive Multi-Dimensional Disease Similarity Computation Framework for miRNA-Disease Association Prediction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In light of the explosive growth of gene and disease association data driven by the rapid development of high-throughput sequencing technologies, mining potential molecular associations between diseases from massive biomedical data has emerged as a crucial direction in bioinformatics research. To address this, we propose and implement a macroscopic gene-associated disease similarity computational framework tailored for bioinformatics mining. Leveraging the latest miRNA-disease association dataset (HMDD v4.0) [1] as the core data source, this framework deeply integrates gene-disease association information from the DisGeNET database [2] and the hierarchical structure data of MeSH (Medical Subject Headings) to construct a multi-source heterogeneous disease association network. The objective is to provide a comprehensive and precise disease similarity prediction strategy for biomedical research through multi-dimensional data fusion analysis. Technically, the framework integrates multiple core computational mechanisms, including target-based disease functional similarity, MeSH-based semantic similarity, and network topology-based Gaussian Interaction Profile similarity. Notably, the semantic similarity algorithm incorporates Information Content (IC) weights upon classical structural decay; the functional similarity algorithm introduces an exponential penalty term to address the long-tail distribution problem, effectively resolving the illusion of false-positive associations caused by unequal research depth between common and rare diseases, thereby achieving a multi-perspective quantification of disease similarity. Based on the calculated similarity matrices, we conducted extensive evaluations (e.g., AUC, AUPR, Accuracy) using state-of-the-art deep learning models, such as graph neural networks (TriFusion) [9] and Transformer architectures (MDformer) [8]. Experimental results demonstrate that this multi-dimensional similarity feature enhancement strategy can effectively reveal potential biological associations between diseases, validating its application value in drug repositioning and rare disease auxiliary prediction, while exhibiting robust generalized predictive performance and high scalability.