Organ-specific prioritization and annotation of non-coding regulatory variants in the human genome
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Identifying non-coding regulatory variants in the human genome remains a challenging task in genomics. Recently, we released the second version of our leading regulatory variant database, RegulomeDB. Building upon this comprehensive database, we developed a novel machine-learning architecture, TLand, which utilizes RegulomeDB-derived features to predict regulatory variants at the cell- or organ-specific level. In our holdout benchmarking, TLand consistently outperformed state-of-the-art models, demonstrating its ability to generalize to new cell lines or organs. We trained three types of organ-specific TLand models to overcome the common model bias toward high data availability cell lines or organs. These models accurately prioritize relevant organs for 2 million GWAS SNPs associated with GWAS traits. Moreover, our analysis of top-scoring variants in specific organ models showed a high enrichment of relevant GWAS traits. We expect that TLand and RegulomeDB will further advance our ability to understand human regulatory variants genome-wide.