Population-Specific Structural Variant Landscape in a Puerto Rican Rare Disease Cohort
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The integration of long-read PacBio High-Fidelity (HiFi) sequencing with the complete Telomere-to-Telomere CHM13 (T2T-CHM13) reference genome has enabled thorough characterization of structural variants (SVs) in previously inaccessible genomic regions, yet Puerto Rican and broader admixed populations remain critically underrepresented in these advances. We performed HiFi whole genome sequencing on 90 samples across 30 parent-proband trios in the Genomic Answers for Kids (GA4K) program (15 European, 15 Puerto Rican) aligned to T2T-CHM13, identifying 1,729,471 deletions, 18,805 duplications, 1,203,260 insertions, and 2,872 inversions with stringent filtering. Puerto Rican individuals carried significantly more SVs, with enrichment in centromeric/pericentromeric and telomeric regions. SV genotypes provided strong ancestry discrimination (72.3% total variance by MDS vs 8.6% for SNVs), and ancestry-associated SVs were predominantly Puerto Rican for deletions and duplications. Functionally, Puerto Rican-enriched SVs intersected constrained and dosage-sensitive genes, including recurrent UTR and coding events with plausible regulatory or dosage effects. Together, these findings demonstrate that structural variants exhibit significant population-specific distributions and underscore the importance of combining complete reference genomes with long-read sequencing for ancestry-considerate interpretation.