Finetuning Foundation Models for Temporal Clinical Transcriptomics Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Timeseries clinical transcriptomic datasets offer the opportunity to gain insights into the dynamics of disease mechanisms/treatment responses. However, their utility in uncovering temporal patterns is often limited by high noise levels and small sample sizes. Leveraging foundational gene embeddings and incorporating interaction information can help address these challenges, improve gene network analysis, and enable the detection of subtle changes that drive disease progression or drug response.
Results
We finetuned gene embeddings from foundation models using healthy tissue gene expression data and used them in temporal GNNs to model gene expression of responder and non-responders to treatment in 3 disease datasets - ulcerative colitis, Crohn’s disease and psoriasis. Application of our method to these datasets confirmed known mechanisms associated with drug action, and also identified key differences between activated and repressed pathways for responders and non responders including B-Cell activation and mitochondria related activity in ulcerative colitis patients.
Conclusion
Finetuning gene embeddings from foundation models provide a richer context to model gene expression data compared to using them in their naive state. Even with smaller sample sizes, results from GNN-based temporal models outperform traditional methods by detecting known mechanisms of response and unraveling role of genes and mechanisms not known to be associated with response and non-response.
Code Availability
Code and data are available in a public GitHub repository - https://github.com/Sanofi-Public/GNN-Timeseries