LitGene: a transformer-based model that uses contrastive learning to integrate textual information into gene representations

Ala Jararweh
Oladimeji Macaulay
David Arredondo
Olufunmilola M Oyebamiji
Yue Hu
Luis Tafoya
Yanfu Zhang
Kushal Virupakshappa
Avinash Sahu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Representation learning approaches leverage sequence, expression, and network data, but utilize only a fraction of the rich textual knowledge accumulated in the scientific literature. We present LitGene, an interpretable transformer-based model that refines gene representations by integrating textual information. The model is enhanced through a Contrastive Learning (CL) approach that identifies semantically similar genes sharing a Gene Ontology (GO) term. LitGene demonstrates accuracy across eight benchmark predictions of protein properties and robust zero-shot learning capabilities, enabling the prediction of new potential disease risk genes in obesity, asthma, hypertension, and schizophrenia. LitGene’s SHAP-based interpretability tool illuminates the basis for identified disease-gene associations. An automated statistical framework gauges literature support for AI biomedical predictions, providing validation and improving reliability. LitGene’s integration of textual and genetic information mitigates data biases, enhances biomedical predictions, and promotes ethical AI practices by ensuring transparent, equitable, open, and evidence-based insights. LitGene code is open source and also available for use via a public web interface at litgene.avisahuai.com .

Version published to 10.1101/2024.08.07.606674v2 on bioRxiv
Aug 8, 2024
Version published to 10.1101/2024.08.07.606674v1 on bioRxiv
Aug 7, 2024

EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data

This article has 3 authors:
1. Jiecong Lin
2. Ruibang Luo
3. Luca Pinello
This article has no evaluationsLatest version Aug 1, 2024
SciMind: A Multimodal Mixture-of-Experts Model for Advancing Pharmaceutical Sciences

This article has 8 authors:
1. Zhaoping Xiong
2. Xintao Fang
3. Haotian Chu
4. Xiaozhe Wan
5. Liwei Liu
6. Yameng Li
7. Wenkai Xiang
8. Mingyue Zheng
This article has no evaluationsLatest version Jul 21, 2024
Prompting large language models to extract chemical‒disease relation precisely and comprehensively at the document level

This article has 3 authors:
1. Mei Chen
2. Tingting Zhang
3. Shibin Wang
This article has no evaluationsLatest version Aug 9, 2024

Listed in

Abstract

Article activity feed

Related articles

EPInformer: A Scalable Deep Learning Framework for Gene Expression Prediction by Integrating Promoter-enhancer Sequences with Multimodal Epigenomic Data

SciMind: A Multimodal Mixture-of-Experts Model for Advancing Pharmaceutical Sciences

Prompting large language models to extract chemical‒disease relation precisely and comprehensively at the document level