The causes of signed linkage disequilibrium within genomic datasets

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Several recent studies have quantified signed linkage disequilibrium (LD) among mutations in genomic datasets, often reporting positive LD, particularly among mutations presumed to be less deleterious, such as synonymous variants. In this article, we investigate two potential sources of this positive LD: the focus on rare alleles, as adopted in several previous studies, and errors arising in the mapping of short-read sequences onto a reference genome. Using coalescent simulations, we extend previous theoretical results of the effect of focusing on rare alleles, and show that derived alleles present at similar frequencies tend to be in positive LD. Reanalyzing datasets from Capsella grandiflora and Drosophila melanogaster , we show that LD among synonymous derived alleles vanishes in the absence of any conditioning on frequency, while LD between mutations categorized as potentially deleterious by the SIFT4G program stays positive. However, we show that in both cases, this positive LD may be at least partly caused by the potential mismapping of a small fraction of sequences in some individuals, which could be a consequence of structural variants that are absent from the reference genome. Overall, these results show that average signed LD among mutations can be strongly affected by technical artifacts even if these concern only a minority of variants. Finally, we discuss other possible sources of positive LD among deleterious mutations.

Article activity feed