Reference-guided Genome Assembly of Long Non-coding RNA Transcripts Reveals Target Genes Associated With Crohn’s Disease

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Crohn’s disease (CD) is highly heterogeneous in presentation and progression with no cure. Molecular phenotyping has been used to elucidate cellular and tissue-based alterations to characterize drivers and effects of disease. One currently understudied class of functional molecules is long non-coding RNAs (lncRNAs). Studying the full lncRNA landscape in IBD is challenging due in part to an incomplete lncRNA annotation and a lack of their functional characterization in tissues of interest. We used a genome-guided alignment strategy to assemble predicted lncRNA transcripts using short RNA-sequencing data from colon tissue of adult patient samples. When combining our predicted lncRNAs with previous lncRNA annotations, we determined 98 that were differentially expressed, recapitulating many from previous IBD studies while also uncovering new ones. We built gene co-expression networks to cluster lncRNAs with functionally characterized protein-coding genes. Clusters containing differential lncRNAs were correlated to disease status and associated with pathways related to the humoral immune response, metabolism, and tissue regeneration. We uncovered multiple differential lncRNAs whose expression significantly correlated with nearby differential protein-coding genes that have also been differentially expressed in other IBD datasets, such as PITX2 . We focused on a predicted lncRNA that is antisense to the PITX2-adjacent lncRNA PANCR , which we called PANCR-AS1 , and provide multiple lines of evidence that support PANCR-AS1 functioning as an enhancer of PITX2 expression. Overall, we determined lncRNAs that are potential contributors to CD pathogenesis. We developed a robust pipeline for identifying lncRNAs in diseased and non-diseased tissue that are absent from reference annotations. We also outlined a framework to pinpoint significant disease-associated lncRNAs with potential functional activity related to their nearby protein-coding genes.

Article activity feed