The Clinical Genomic Variation Landscape
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Interpreting genomic variation requires analysts to collate and process information from disparate genomic evidence resources to discern the contributions to diseases and drug responses. Differences in variant representation across these evidence repositories includes nomenclature (e.g., HGVS, SPDI), reference sequence context (e.g., GRCh37 or GRCh38 genome assemblies), sequence annotation sources (e.g., RefSeq or Ensembl), and aggregate variant concepts (e.g., canonical alleles) collectively make it difficult to reveal whether (and how) genomic variants are associated with clinical outcomes. We evaluated these challenges across established genomic knowledge resources, including content from the CIViC, Molecular Oncology Almanac, and ClinVar knowledgebases, as compared against real-world small variant and CNV data. We used these findings to develop a suite of variant normalization methods to address these gaps. We present our findings as well as an analysis of remaining gaps in the representation of variation data and recommendations for the continued development of genomic knowledge standards to address these gaps.