Classification and regression trees clarify the role of epistasis and environment in genotype–phenotype maps

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Understanding how genetic variation translates into phenotypic outcomes is central to various sub-fields of genetics. This task is complicated by a range of forces–including epistasis, environmental modulation of mutation effects, and ecological influences–that complicate the process of mapping from genotype to phenotype. In this study, we apply a unified decision tree approach, classification and regression trees (CART), to model genotype-phenotype relationships across protein fitness landscapes across a diversity of organisms: (i) a fluorescent protein isolated from Entacmaea quadricolor (bubble-tip anemone), (ii) antifolate resistance in Plasmodium falciparum (malaria parasite) dihydrofolate reductase (DHFR) under drug concentration gradients, (iii) allelic variants from the long-term evolution experiment (LTEE) in Escherichia coli, (iv) proteostasis-modulated drug resistance phenotypes in three bacterial orthologues of DHFR, and (v) chemotypic diversification of sesquiterpene synthases in Nicotiana tabacum (cultivated tobacco). Our results demonstrate that decision trees can effectively capture higher-order interactions between mutations and environments, uncovering nonlinear dependencies and contingencies that are often missed by traditional parametric models. By enabling clear visualization of interaction hierarchies, CART serves as both a predictive tool and an explanatory framework for genotype-phenotype mapping. This approach has use cases across the spectrum, from resolving the genomic architecture of biological traits, to personalized medicine, and varied applications in bioengineering.

Article activity feed