GECSI: Large-scale chromatin state imputation from gene expression

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Compendiums of chromatin state annotations based on integrating maps of multiple epigenetic marks such as from ChromHMM have become a powerful resource. While these compendiums have coverage of many biological samples, there are many additional biological samples that have gene expression data but lack epigenetic mark data and chromatin state annotations. The EpiAtlas resource of the International Human Epigenome Consortium (IHEC) contains a large compendium of chromatin state annotations for which many samples have matched gene expression data, which provides the opportunity to use it to train models to predict chromatin state annotations in additional biological samples with only gene expression data available. To address this, we develop Gene Expression-based Chromatin State Imputation (GECSI), which uses a multi-class logistic regression model trained using a large compendium of gene expression and chromatin state annotations, and apply it to IHEC data. Using cross-validation, we find that GECSI accurately predicts chromatin state assignments and generates probability estimates that are predictive of observed chromatin states, overall outperforming multiple other alternative and baseline methods. GECSI-predicted chromatin states reflect relationships among biological samples and show similar transcription factor and gene annotation enrichments as observed chromatin states. Using available IHEC gene expression data, we apply GECSI to predict chromatin state annotations for 449 additional epigenomes. We expect these predicted annotations and the GECSI software will be a useful resource for chromatin state analyses in many additional biological samples.

Article activity feed