The Structural Code of Breast Cancer Proteoform: Alternative Splicing-driven Protein Isoform Variation and Functional Diversification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Alternative splicing (AS) is widespread in cancer, yet current studies provide limited coverage of AS-derived proteoforms and lack a systematic, high-resolution atlas linking isoform sequence variation to structural remodeling and functional diversification across tumors. Here, we introduce the three-dimensional Structure Isoform Galaxy (3DisoGalaxy) platform, an isoform-resolved breast cancer knowledge base that integrates the transcriptome, translatome, and foldome to enable structure-grounded, large-scale computational analysis of AS-derived proteoforms. We integrated PacBio long-read RNA sequencing (Iso-Seq; n = 35), short-read RNA-seq from four breast cancer cohorts, and two ribosome profiling (Ribo-seq) datasets (n = 42) to curate full-length transcript variants and define translationally supported open reading frames (ORFs). Across these datasets, we identified 123,395 transcript variants and 73,715 ORFs through stringent transcript- and translational-level curation, enabling construction of a breast cancer foldome of high-quality protein structure models. A stringent, quality-controlled subset of 46,601 structures was further annotated with structure-resolved motifs and organized into a structural similarity network, enabling structure-based functional concordance analyses and forming 3DisoGalaxy.
Graphical Abstract
3DisoGalaxy generates testable mechanistic hypotheses, exemplified by a KRAS isoform, KRAS4A, showing selective loss of motif instances relative to the canonical isoform, and by nomination of a non-canonical AKT1 isoform with the strongest triple-negative breast cancer (TNBC)-biased expression shift within the AKT1 family and a modest relapse-free survival association. 3DisoGalaxy is accessible through an interactive web portal that provides integrated multi-omics results and 3D structure visualization.