Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structural variants (SVs) contribute significantly to human genetic diversity and disease 1–4 . Previously, SVs have remained incompletely resolved by population genomics, with short-read sequencing facing limitations in capturing the whole spectrum of SVs at nucleotide resolution 5–7 . Here we leveraged nanopore sequencing 8 to construct an intermediate coverage resource of 1,019 long-read genomes sampled within 26 human populations from the 1000 Genomes Project. By integrating linear and graph-based approaches for SV analysis via pangenome graph-augmentation, we uncover 167,291 sequence-resolved SVs in these samples, considerably advancing SV characterization compared to population-wide short-read sequencing studies 3,4 . Our analysis details diverse SV classes—deletions, duplications, insertions, and inversions—at population-scale. LINE-1 and SVA retrotransposition activities frequently mediate transductions 9,10 of unique sequences, with both mobile element classes transducing sequences at either the 3′- or 5′-end, depending on the source element locus. Furthermore, analyses of SV breakpoint junctions suggest a continuum of homology-mediated rearrangement processes are integral to SV formation, and highlight evidence for SV recurrence involving repeat sequences. Our open-access dataset underscores the transformative impact of long-read sequencing in advancing the characterisation of polymorphic genomic architectures, and provides a resource for guiding variant prioritisation in future long-read sequencing-based disease studies.

Article activity feed