Decoding Complex Genotype-Phenotype Interactions by Discretizing the Genome

Jędrzej Kubica
Hetvi Jethwani
Krzysztof H. Banecki
Mauricio Moldes
Dariusz Plewczynski
Ben Busby

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Despite the ease and affordability of genome sequencing in biomedical research, the genetic causes of many diseases or their subtypes remain unknown due to diverse biological mechanisms that complicate genotype-phenotype relationships. Most previous studies have focused on single variants or sets of variants presumed to be directly causal for the disease. However, incomplete penetrance, in which some individuals carry disease-associated variants yet exhibit no phenotype, suggests that these variants, the genomic background and other secondary factors combine to shape the susceptibility to the disease.Results: Here, we introduce a new methodology for genotype-phenotype mapping based on genomic hashes, unique representations of local genomic background. Each hash corresponds to a haplotype-resolved set of variants within one recombination-defined genomic region (haploblock). We provide a practical guide for using genomic hashes to train machine learning models that link genomic background and specific variant sets to phenotypic outcomes. We implemented this framework as a ready-to-use bioinformatics pipeline capable of fast, scalable, hash-based genome comparison. The pipeline is available on GitHub: https://github.com/collaborativebioinformatics/Haploblock_Clusters_ElixirBH25How it benefits the community: Genomic hashes offer a computationally efficient framework for large-scale genotype-phenotype mapping. By discretizing the genome into haploblocks, this approach will facilitate the search for causes of complex phenotypes across the entire genome and the prediction of precision prevention points and treatments.

Version published to 10.37044/osf.io/xhkc3_v1 on OSF Preprints
Dec 17, 2025

A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

This article has 4 authors:
1. Xin Wang
2. Guangbao Luo
3. Li Xiao
4. Zhangjun Fei
This article has no evaluationsLatest version Feb 18, 2026
Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

This article has 27 authors:
1. José Rodríguez-Martínez
2. Edwin Peña-Martínez
3. Shreya Sharma
4. Joshua Medina-Feliciano
5. Elise Root
6. Lois Parks
7. Marissa Granitto
8. Diego Pomales-Matos
9. Jean Messon- Bird
10. Adriana Barreiro-Rosario
11. Leandro Sanabria-Alberto
12. Alejandro Rivera-Madera
13. Jessica Rodríguez-Ríos
14. Rosalba Velázquez-Roig
15. Juan Figueroa- Rosado
16. Mackenzie Noon
17. Omer Donmez
18. Carmy Forney
19. Hayley Hesse
20. Katelyn Dunn
21. Xiaoting Chen
22. Matthew Hass
23. Lucinda Lawson
24. Matthew Weirauch
25. Leah Kottyan
26. Steven Reilly
27. Devesh Bhimsaria
This article has no evaluationsLatest version Jan 7, 2026
Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies

This article has 3 authors:
1. Marc Gros La Faige
2. Emmanuelle Génin
3. Anthony Herzig
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

Global Evaluation of Congenital Heart Disease-Associated Non-Coding Variants

Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies