STRkit: precise, read-level genotyping of short tandem repeats using long reads and single-nucleotide variation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Variation in short tandem repeats (STRs) is implicated in Mendelian disease and complex traits, but can be difficult to resolve with short-read genome sequencing. We present STRkit , a software package for genotyping STRs using long read sequencing (LRS) that uses nearby single-nucleotide variants to improve genotyping accuracy without a priori haplotype information. We show that STRkit has unique strengths versus other methods: it can use data from both major LRS technologies (Pacific Biosciences HiFi [PB] and Oxford Nanopore [ONT]) to output both allele and read-level copy number and sequence, performs best in benchmarking with F1 scores of 0.9633 and 0.9056 with PB and ONT data respectively, achieves a Mendelian inheritance rate of 97.86% with PB data, and is open source software. STRkit 's features open up new possibilities for association testing, assessing patterns of STR inheritance, and better understanding the functional effects of these notable repeat elements.

Article activity feed