Limitations and Enhancements in Genomic Language Models: Dynamic Selection Approach

Shibo Qiu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genomic Language Models (GLMs), which learn from nucleotide sequences, are crucial for understanding biological principles and excel in tasks such as sequence generation and classification. However, state-of-the-art models vary in training methods, architectures, and tokenization techniques, resulting in different strengths and weaknesses. We propose a multi-model fusion approach with a dynamic model selector that effectively integrates three models with distinct architectures. This fusion enhances predictive performance in downstream tasks, outperforming any individual model and achieving complementary advantages. Our comprehensive analysis reveals a strong correlation between model performance and motif prominence in sequences. Nevertheless, overreliance on motifs may limit the understanding of ultra-short core genes and the context of ultra-long sequences. Importantly, based on our in-depth experiments and analyses of the current three leading models, we identify unresolved issues and suggest potential future directions for the development of genomic models. The code, data, and pre-trained model are available at https://github.com/Jacob-S-Qiu/glm\_dynamic\_selection.

Version published to 10.1101/2024.11.25.624002v2 on bioRxiv
Dec 25, 2024
Version published to 10.1101/2024.11.25.624002v1 on bioRxiv
Nov 26, 2024

Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture

This article has 4 authors:
1. Ilaria Alfisi
2. Francesca Ciapi
3. Marta Baragli
4. Alberto Magi
This article has no evaluationsLatest version Jun 19, 2025
Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma

This article has 24 authors:
1. Yihui Wang
2. Zhiyuan Cai
3. Qian Zeng
4. Yihang Gao
5. Jiarui Ouyang
6. Yingxue Xu
7. Shu Yang
8. Sunan He
9. Yuxiang Nie
10. Yu Cai
11. Fengtao Zhou
12. Cheng Jin
13. Xi Wang
14. Zhi Xie
15. Danqing Zhu
16. Ting Xie
17. Kwang-Ting Cheng
18. Can Yang
19. Xi Fu
20. Jiguang Wang
21. Kang Zhang
22. Jianhua Yao
23. Raul Rabadan
24. Hao Chen
This article has no evaluationsLatest version Jun 30, 2025
Hybrid BiGRU-BiLSTM Model for Robust Gene Sequence Classification: Leveraging K-Mer Preprocessing and Comprehensive Evaluation

This article has 2 authors:
1. pranjali malviya
2. Pon Harshavardhanan
This article has no evaluationsLatest version Jun 26, 2025

Listed in

Abstract

Article activity feed

Related articles

Benchmarking DNA Foundation Models for zero-shot variant effect prediction: the role of context, training, and architecture

Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma

Hybrid BiGRU-BiLSTM Model for Robust Gene Sequence Classification: Leveraging K-Mer Preprocessing and Comprehensive Evaluation