Challenges in Predicting Chromatin Accessibility Differences between Species
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Enhancers are transcriptional regulatory elements that help drive phenotypic diversity, yet they often undergo rapid sequence evolution despite functional conservation, posing a challenge for predicting their function across species. Machine learning models that predict quantitative enhancer activity using DNA sequence have not previously been evaluated for their ability to predict quantitative differences across orthologous regions. Here, we trained convolutional neural networks (CNNs) on a regression task to predict chromatin accessibility, which is a proxy for enhancer activity, in the liver across five mammals, and we developed a novel framework to evaluate cross-species performance. We demonstrated that training on multiple species improves model generalization to both species used in training and held-out species. However, the models consistently achieved poor performance in predicting quantitative differences in accessibility between species at orthologous regions. Our study highlights the challenges in using regression models to predict chromatin accessibility changes between species.