Statistical Approach Leveraging Genealogies of Populations with a Founder Effect and Identical by Descent Segments to Identify Rare Variants in Complex Diseases

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The missing heritability caused by rare variants (RVs) poses a significant challenge to pre-established statistical methods. Our study aims at detecting RVs using identical-by-descent (IBD) segments as a proxy for recent variants in family data from a population with a founder effect for which genealogy is available—a distinguishing feature of our approach. Inferring IBD segments from genotype array data, which is more accessible than whole genome sequences, enables application to large sample sizes. Our approach involves dividing the genome into fixed-length windows, treating each window as a synthetic genomic region (SG), and then identifying groups of affected individuals sharing a specific IBD segment over an SG by analyzing genotype array data to infer pairwise IBD segments. Data from pairwise IBD segments is then used to identify densely connected haplotypes as IBD clusters via DASH. Lastly, we adapt, implement, and evaluate statistics to test for IBD sharing enrichment among affected individuals within SGs. The null distribution of the genome-wide maximal statistic value is obtained by simulating whole-genome transmission in a genealogy using msprime. For application purposes, Eastern Quebec has been studied as an example of a population with a founder effect. Using the BALSAC database to reconstruct the genealogy of 1,200 subjects across 48 schizophrenia and bipolar disorder multi-generational families led to an 18-generation pedigree with 84% completeness at the 10th generation. The statistic denoted as S msg for the “most shared haplotype in an SG” exhibits superior power in detecting causal SGs when compared to the adapted S all measure and (with a single causal variant in a region) to GMMAT (Generalized Linear Mixed Model Association Test) applied to IBD clusters. Our analysis of data pertaining to schizophrenia and bipolar disorder reveals no regions that surpass the conventional significance thresholds for harboring rare variants associated with these disorders. Two distinct regions—on chromosomes 5 and 11—stand out due to their maximal S msg values. These findings underscore the potential of leveraging genealogical data and IBD segments to uncover rare variants in complex diseases.

Article activity feed