Complex Indel Detection: A Simulation-Based Framework and Parsing with FreeBayes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In contrast to simple deletions and simple insertions, most complex indels involve both deletions and insertions, often with base changes within a few nucleotides of the indel’s left and right boundaries. These complex indels often arise from double-strand breaks (DSB), which in normal somatic cells are predominantly repaired by nonhomologous DNA end joining (NHEJ). Such complex indels pose a difficult analytical problem for existing indel callers because the observed VCF representation may be locally shifted, extended with matching flanking bases, or fragmented into several closely spaced calls. To evaluate complex indel representation, we tested six variant calling approaches: FreeBayes, HaplotypeCaller, Mutect2, Strelka2, DRAGEN Germline, and DRAGEN Somatic pipelines. Among the approaches evaluated, FreeBayes most consistently represented simulated complex indels as single nearby variant records. We then developed a parsing workflow that derives effective deleted and inserted sequences from FreeBayes VCF output and enriches for candidate complex indels. This approach supports analysis of naturally occurring DSB repair events in single human colon crypts.