TRCompDB: A reference of human tandem repeat sequence and composition variation from long-read assemblies

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Tandem repeats (TRs), including short tandem repeats (STRs) and variable-number tandem repeats (VN-TRs), are hypermutable genetic elements consisting of tandem arrays of repeated motifs. TR variation can modify gene expression and has been implicated in over 50 diseases through repeat mutation and pathogenic expansion. Recent advances in long-read sequencing (LRS) enable the comprehensive profiling of TR variation in large cohorts. We previously developed vamos , a tool for annotating motif count and composition in LRS samples. Here, we expanded the functionality of vamos with new methods to construct motif databases that enhanced motif consistency, and a toolset tryvamos for rapid analysis using vamos output. We demonstrate that the vamos motif composition annotations more accurately reflect underlying genomes than other approaches for TR annotation. By applying vamos to 360 LRS assemblies of diverse ancestries, we constructed TRCompDB, a reference database of tandem repeat variation across 805,485 STR and 370,468 VNTR loci on the CHM13 reference genome. Using tryvamos for genome-wide testing, we identified 6,039 loci exhibiting strong signatures of population divergence in length or composition, yielding insight into stratification of TR loci.

Article activity feed