Pretraining Improves Prediction of Genomic Datasets Across Species

Fangrui Huang
Yitong Wang
Janet Song
Ashok Cutkosky

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent studies suggest that deep neural network models trained on thousands of human genomic datasets can accurately predict genomic features, including gene expression and chromatin accessibility. However, training these models is computation- and time-intensive, and datasets of comparable size do not exist for most other organisms. Here, we identify modifications to an existing state-of-the-art model that improve model accuracy while reducing training time and computational cost. Using this stream-lined model architecture, we investigate the ability of models pretrained on human genomic datasets to transfer performance to a variety of different tasks. Models pretrained on human data but fine-tuned on genomic datasets from diverse tissues and species achieved significantly higher prediction accuracy while significantly reducing training time compared to models trained from scratch, with Pearson correlation coefficients between experimental results and predictions as high as 0.8. Further, we found that including excessive training tasks decreased model performance and that this compromised performance could be partially but not completely rescued by fine-tuning. Thus, simplifying model architecture, applying pretrained models, and carefully considering the number of training tasks may be effective and economical techniques for building new models across data types, tissues, and species.

Version published to 10.1101/2025.08.20.671362 on bioRxiv
Aug 24, 2025

An end-to-end generalizable deep learning framework to comprehensively analyze transcriptional regulation

This article has 12 authors:
1. Zhong Wang
2. Charles Danko
3. Zhaoxi Zhang
4. Xiaoya FAN
5. Jiaxin Zhong
6. Lijuan Jia
7. Yuanyuan Han
8. Chenyi Yang
9. Zengyou He
10. Xiaoyan Li
11. Shing-Tung Yau
12. Rongling Wu
This article has no evaluationsLatest version Aug 18, 2025
EssSubgraph improves performance and generalizability of mammalian essential gene prediction with large networks

This article has 6 authors:
1. Haimei Wen
2. Susan Carpenter
3. Karen McGinnis
4. Andrew Nelson
5. Keriayn Smith
6. Tian Hong
This article has no evaluationsLatest version Jul 25, 2025
IFAM: Improving genomic prediction accuracy of complex traits by integrating massive types of functional annotation information

This article has 13 authors:
1. Xiaolei Liu
2. Tang Zhenshuang
3. Xiong Xiong
4. HaoHao Zhang
5. Yin Dong
6. Fu Yuhua
7. Yunxia Zhao
8. Jingjin Li
9. Quan Yuan
10. Xiang Zhou
11. Xinyun Li
12. Lilin Yin
13. Zhao Shuhong
This article has no evaluationsLatest version Aug 4, 2025

Listed in

Abstract

Article activity feed

Related articles

An end-to-end generalizable deep learning framework to comprehensively analyze transcriptional regulation

EssSubgraph improves performance and generalizability of mammalian essential gene prediction with large networks

IFAM: Improving genomic prediction accuracy of complex traits by integrating massive types of functional annotation information