Combining Directed Evolution with Machine Learning Enables Accurate Genotype-to-Phenotype Predictions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Linking sequence variation to phenotypic effects is critical for efficient exploitation of large genomic datasets. Here we present a novel approach combining directed evolution with protein language modeling to characterize naturally-evolved variants of a rice immune receptor. Using high-throughput directed evolution, we engineered the rice immune receptor Pik-1 to bind and recognize the fungal proteins Avr-PikC and Avr-PikF, which evade detection by currently characterized Pik-1 alleles. A protein language model was fine-tuned on this data to correlate sequence variation with ligand binding behavior. This modeling was then used to characterize Pik-1 variants found in the 3,000 Rice Genomes Project dataset. Two variants scored highly for binding against Avr-PikC, and in vitro analyses confirmed their improved ligand binding over the wild-type Pik-1 receptor. Overall, this machine learning approach identified promising sources of disease resistance in rice and shows potential utility for exploring the phenotypic variation of other proteins of interest.

Article activity feed