ParaMask, a new method to identify multicopy genomic regions, corrects major biases in whole-genome sequencing data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multicopy genomic regions are repeated sequences that can bias genomics analyses. Here, we present a method to identify and filter multicopy regions in population-level genomic data of any species. The broad applicability of this method stems from a flexible Expectation-Maximization framework to detect excess heterozygosity while simultaneously fitting inbreeding levels. By combining this signature with read ratio deviations, excess sequencing coverage, and a clustering technique, our method attains high power. We show that multicopy regions create biases that confound evolutionary genomics analyses, and that by identifying these regions with our method and filtering them, we can correct these biases.

Article activity feed