Population-Specific Structural Variant Landscape in a Puerto Rican Rare Disease Cohort

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The integration of long-read PacBio High-Fidelity (HiFi) sequencing with the complete Telomere-to-Telomere CHM13 (T2T-CHM13) reference genome has enabled thorough characterization of structural variants (SVs) in previously inaccessible genomic regions, yet Puerto Rican and broader admixed populations remain critically underrepresented in these advances. We performed HiFi whole genome sequencing on 90 samples across 30 parent-proband trios in the Genomic Answers for Kids (GA4K) program (15 European, 15 Puerto Rican) aligned to T2T-CHM13, identifying 1,729,471 deletions, 18,805 duplications, 1,203,260 insertions, and 2,872 inversions with stringent filtering. Puerto Rican individuals carried significantly more SVs, with enrichment in centromeric/pericentromeric and telomeric regions. SV genotypes provided strong ancestry discrimination (72.3% total variance by MDS vs 8.6% for SNVs), and ancestry-associated SVs were predominantly Puerto Rican for deletions and duplications. Functionally, Puerto Rican-enriched SVs intersected constrained and dosage-sensitive genes, including recurrent UTR and coding events with plausible regulatory or dosage effects. Together, these findings demonstrate that structural variants exhibit significant population-specific distributions and underscore the importance of combining complete reference genomes with long-read sequencing for ancestry-considerate interpretation.

Article activity feed