Integrating Protein-protein Interaction Networks and Machine Learning to Identify Biomarkers of Cancer Onset
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent large-scale plasma proteomic studies have identified a set of biomarkers for the diagnosis of early cancer onset, but the predictive performance is still a challenging problem. Most existing studies have treated proteins as independent markers, ignoring their functional interdependencies within the biological network. We consider that protein-protein interaction (PPI) networks can capture coordinated biological signals to enhance the predictive performance. We identified 1,605 high-confidence PPI pairs of proteins (corresponding to 1,155 unique proteins) from the STRING database (confidence scores>0.9). The plasma proteomic data of these pairs were extracted from a subset of 38,585 UK Biobank participants with Olink measurements (noted as UKB-PPP). The univariate Cox regression (p<0.05) integrated with elastic-net machine learning models was used to build PPI predictive model on 23 cancer types, which seven cancer types with robust PPI associated with were included in the final predictive model. In general, models included proteomics features outperformed those based on age, sex, and lifestyles. Incorporating PPI-derived interaction features further improved prediction performance in three of the seven cancer models, with melanoma showing a significant improvement compared to the base model in C-index (delta C-index = 0.13). In summary, integrating PPI networks with proteomic models could provide predictive gains in specific cancer types and underscore the value of molecular interaction patterns as complementary biomarkers for cancer onset.