On the analysis of genetic association with long-read sequencing data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Long-read sequencing (LRS) technologies have enhanced the ability to resolve complex genomic architecture and determine the ‘phase’ relationships of genetic variants over long distances. Although genome-wide association studies (GWAS) identify individual variants associated with complex traits, they do not typically account for whether multiple associated signals at a locus may act in cis or trans , or whether they reflect allelic heterogeneity. As a result, effects that arise specifically from phase relationships may remain hidden in analyses using short-read and microarray data. While the advent of LRS has enabled accurate measurement of phase in population cohorts, statistical methods that leverage phase in genetic association analysis remain underdeveloped. Here, we introduce the Regression on Phase (RoP) method, which directly models cis and trans phase effects between variants under a regression framework. In simulations, RoP outperforms genotype interaction tests that detect phase effects indirectly, and distinguishes in- cis from in- trans phase effects. We implemented RoP at two cystic fibrosis (CF) modifier loci discovered by GWAS. At the chromosome 7q35 trypsinogen locus, RoP confirmed that two variants contributed independently (allelic heterogeneity). At the SLC6A14 locus on chromosome X, phase analysis uncovered a coordinated regulatory mechanism in which a promoter variant modulates lung phenotypes in individuals with CF when acting in cis with a lung-specific enhancer (E2765449/enhD). This coordinated regulation was confirmed in functional studies. These findings highlight the potential of leveraging phase information from LRS in genetic association studies. Analyzing phase effects with RoP can provide deeper insights into the complex genetic architectures underlying disease phenotypes, ultimately guiding more informed functional investigations and potentially revealing new therapeutic targets.

Author summary

Traditional genetic association studies typically link individual genetic variants to diseases but often neglect how variants may jointly affect outcomes based on their arrangement across maternal and paternal chromosomes, known as phase relationships. Understanding phase effects is essential for uncovering the mechanisms underlying complex diseases. Recent advances in long-read sequencing technology allow precise measurement of phase relationships over extensive chromosome regions; however, statistical methods for analyzing these effects remain limited. We developed a novel statistical approach called Regression on Phase (RoP) to directly assess these complex genetic interactions. Our simulation studies demonstrated that RoP effectively identifies effects dependent on specific phase arrangements. Applying RoP to genetic variants contributing to cystic fibrosis (CF) revealed phase-dependent mechanisms affecting CF-related lung disease, which were missed by traditional methods. Analyzing phase effects with RoP can advance our understanding of disease mechanisms, guide future functional studies, and ultimately support the development of personalized medicine.

Article activity feed