Inference and visualization of complex genotype-phenotype maps with gpmap-tools
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding how biological sequences give rise to observable traits, that is, how genotype maps to phenotype, is a central goal in biology. Yet our knowledge of genotype-phenotype maps in natural systems is limited due to the high dimensionality of sequence space and the context-dependent effects of mutations. The emergence of Multiplex assays of variant effect (MAVEs), along with large collections of natural sequences, offer new opportunities to empirically characterize these maps at an unprecedented scale. However, tools for statistical and exploratory analysis of these high-dimensional data are still needed. To address this gap, we developed gpmaptools https://github.com/cmarti/gpmap-tools ), a python library that integrates a series of models for inference, phenotypic imputation, and error estimation from MAVE data or collections of natural sequences in the presence of genetic interactions of every possible order. gpmap-tools also provides methods for summarizing patterns of epistasis and visualization of genotype-phenotype maps containing up to millions of genotypes. To demonstrate its utility, we used gpmap-tools to infer genotype-phenotype maps containing 262,144 variants of the Shine-Dalgarno sequence from both genomic 5’UTR sequences and experimental MAVE data. Visualization of the inferred landscapes consistently revealed high-fitness ridges that link core motifs at different distances from the start codon. In summary, gpmap-tools provides a flexible, interpretable framework for studying complex genotype-phenotype maps, opening new avenues for understanding the architecture of genetic interactions and their evolutionary consequences.