Bio-BLIP: A Multimodal Architecture for Transferable Reasoning in Genomic Variant Interpretation

Anvita Gupta
Alejandro Buendia
Anshul Kundaje
Jure Leskovec

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Developing scientific hypotheses in biology requires integrating heterogeneous evidence across DNA sequence, gene context, protein function, and prior literature. Existing multimodal AI systems expose biological evidence to reasoning models through textification or by projecting biological embeddings into fine-tuned language models. However, these models are typically highly optimized the specific set of tasks for which they are fine-tuned. Here we present Bio-BLIP, a multimodal Q-former based architecture which leverages biological embeddings and a LLM to generalize to complex reasoning tasks without task-specific fine-tuning. The key to Bio-BLIP is a new neural network architecture that integrates four data modalities – DNA, genes, proteins, and text – through a master Qformer model, which integrates the modality-specific information into a fixed-length prefix for the LLM backbone. Bio-BLIP is pretrained on the task of human genetic variant annotation and achieves a 29.8% increase in generating accurate variant features over frontier LLMs. We evaluate Bio-BLIP zero-shot on downstream genomic tasks of variant prioritization and target gene prediction. Bio-BLIP outperforms two alignment-free genomic language models on regulatory variant prioritization for Mendelian disease. Across the target gene prediction task, Bio-BLIP improves accuracy over LLMs by leveraging learned genomic variant knowledge in difficult cases. Our model produces rich, transparent reasoning traces. In biological domains characterized by multiple scales of data and varied downstream tasks, Bio-BLIP offers a step toward natively multimodal, generalizable reasoning.

Version published to 10.64898/2026.05.12.724740 on bioRxiv
May 15, 2026

From nucleotides to semantics: genomic representation learning via joint-embedding predictive architecture

This article has 8 authors:
1. Chengsen Wang
2. Qi Qi
3. Haifeng Sun
4. Zirui Zhuang
5. Bo He
6. Siying Liu
7. Jianxin Liao
8. Jingyu Wang
This article has no evaluationsLatest version Apr 6, 2026
A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects

This article has 9 authors:
1. Mingkun Lu
2. Nanxin You
3. Hongning Zhang
4. Lingyan Zheng
5. Bo Li
6. Wanghao Jiang
7. Yintao Zhang
8. Huaicheng Sun
9. Ying Zhou
This article has no evaluationsLatest version Apr 28, 2026
A Generative Neuro-Symbolic AI for Protein Sequence Design

This article has 13 authors:
1. Marianne Defresne
2. Delphine Dessaux
3. Samuel Buchet
4. Lucie Barthe
5. Liza Ammar-Khodja
6. Bessam Azizi
7. Valentin Durante
8. Gianluca Cioci
9. Simon de Givry
10. Alain Roussel
11. Luis F. Garcia-Alles
12. Thomas Schiex
13. Sophie Barbe
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

From nucleotides to semantics: genomic representation learning via joint-embedding predictive architecture

A Multi-modal LLM-Knowledge Fusion Framework for Predicting Single-cell Genetic Perturbation Effects

A Generative Neuro-Symbolic AI for Protein Sequence Design