Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the U.S.A.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Log in to save this article

Abstract

Null alleles, short allele dominance (SAD), and stuttering increase the perceived relative inbreeding of individuals and subpopulations as measured by Wrights F_IS and F_ST. Ascertainment bias, due to such amplifying problems are usually caused by inaccurate primer design (if developed from a different species or a distant population), poor DNA quality, low DNA concentration, or a combination of some or all these sources of inaccuracy. When combined, these issues can increase the correlation between polymorphism at concerned loci and, consequently, of linkage disequilibrium (LD) between those. In this note, we studied an original microsatellite data set generated by analyzing nine loci in Ixodes scapularis ticks from the eastern U.S.A. To detect null alleles and SAD we used correlation methods and variation measures. To detect stuttering, we evaluated heterozygote deficit between alleles displaying a single repeat difference. We demonstrated that an important proportion of loci affected by amplification problems (one with null alleles, two with SAD and three with stuttering) lead to highly significant heterozygote deficits (F_IS=0.1, p-value<0.0001). This occurred together with an important proportion (22%) of pairs of loci in significant LD, two of which were still significant after a false discovery rate (FDR) correction, and some variation in the measurement of population subdivision across loci (Wrights F_ST). This suggested a strong Wahlund effect and/or selection at several loci. By finding small peaks corresponding to previously disregarded larger alleles in some homozygous profiles for loci with SAD and by pooling alleles close in size for loci with stuttering, we generated an amended dataset. Except for one locus with null alleles and another still displaying a modest SAD, the analyses of the corrected dataset revealed a significant excess of heterozygotes (F_IS=-0.07 as expected in dioecious and strongly subdivided populations, with a more reasonable proportion (19%) of pairs of loci characterized by significant LD, none of which stayed significant after the FDR procedure. Strong subdivision was also confirmed by the standardized F_ST corrected for null alleles (F_ST=0.19) and small effective subpopulation sizes (N_e=7).

Article activity feed

  1. Genetic markers are used for in modern population genetics/genomics to uncover the past neutral and selective history of population and species. Besides Single Nucleotide Polymorphisms (SNPs) obtained from whole genome data, microsatellites (or Short Tandem Repeats, SSR) have been common markers of choice in numerous population genetics studies of non-model species with large sample sizes [1]. Microsatellites can be used to uncover and draw inference of the past population demography (e.g. expansion, decline, bottlenecks…), population split, population structure and gene flow, but also life history traits and modes of reproduction (e.g. [2,3]). These markers are widely used in conservation genetics [4] or to study parasites or disease vectors [5]. Microsatellites do show higher mutation rate than SNPs increasing, on the one hand, the statistical power to infer recent events (for example crop domestication, [2,3]), while, on the other hand, decreasing their statistical power over longer time scales due to homoplasy [6].
    To perform such analyses, however, an excellent and reliable quality of data is required. As emphasized in the article by De Meeûs et al. [7] three main issues do bias the observed heterozygosity at microsatellites: null alleles, short allele dominance (SAD) and stuttering. These originates from poor PCR amplification. As a result, an excess of homozygosity is observed at the microsatellite loci leading to overestimation of the variation statistics FIS and FST as well as increased linage disequilibrium (LD). For null alleles, several methods and software do help to reduce the bias, and in the present study, De Meeûs et al. [7] propose a way to tackle issues with SAD and stuttering.
    The authors study a dataset consisting of 387 samples from 61 subsamples genotyped at nine loci of the species Ixodes scapularis, i.e. ticks transmitting the Lyme disease. Based on correlation methods and FST, FIS they can uncover null alleles and SAD. Stuttering is detected by evaluating the heterozygote deficit between alleles displaying a single repeat difference. Without correction, six loci are affected by one of these amplification problems generating a large deficit of heterozygotes (measured by significant FIS and FST) remaining so after correction for the false discovery rate (FDR). These results would be classically interpreted as a strong Wahlund effect and/or selection at several loci.
    After correcting for null alleles, the authors apply two novel corrections: 1) a re-examination of the chromatograms reveals previously disregarded larger alleles thus decreasing SAD, and 2) pooling alleles close in size decreasing stuttering. The corrected dataset shows then a significant excess of heterozygotes as could be expected in a dioecious species with strong population structure. The FDR correction removes then the significant excess of homozygotes and LD between pairs of loci. FST on the cured dataset is used to demonstrate the strong population structure and small effective subpopulation sizes. This is confirmed by a clustering analysis using discriminant analysis of principal components (DAPC).
    While based on a specific dataset of ticks from different populations sampled across the USA, the generality of the authors’ approach is presented in Figure 6 in which they provide a step by step flowchart to cure microsatellite datasets from null alleles, SAD and stuttering. Several criteria based on FIS, FST and LD between loci are used as decision keys in the flowchart. An excel file is also provided as help for the curation steps. This study and the proposed methodology are thus extremely useful for all population geneticists working on non-model species with large number of samples genotyped at microsatellite markers. The method not only allows more accurate estimates of heterozygosity but also prevents the thinning of datasets due to the removal of problematic loci. As a follow-up and extension of this work, an exhaustive simulation study could investigate the influence of these data quality issues on past demographic and population structure inference under a wide range of scenarios. This would allow to quantify the current biases in the literature and the robustness of the methodology devised by De Meeûs et al. [7].

    References

    [1] Jarne, P., and Lagoda, P. J. (1996). Microsatellites, from molecules to populations and back. Trends in ecology & evolution, 11(10), 424-429. doi: 10.1016/0169-5347(96)10049-5
    [2] Cornille, A., Giraud, T., Bellard, C., Tellier, A., Le Cam, B., Smulders, M. J. M., Kleinschmit, J., Roldan-Ruiz, I. and Gladieux, P. (2013). Postglacial recolonization history of the E uropean crabapple (Malus sylvestris M ill.), a wild contributor to the domesticated apple. Molecular Ecology, 22(8), 2249-2263. doi: 10.1111/mec.12231
    [3] Parat, F., Schwertfirm, G., Rudolph, U., Miedaner, T., Korzun, V., Bauer, E., Schön C.-C. and Tellier, A. (2016). Geography and end use drive the diversification of worldwide winter rye populations. Molecular ecology, 25(2), 500-514. doi: 10.1111/mec.13495
    [4] Broquet, T., Ménard, N., & Petit, E. (2007). Noninvasive population genetics: a review of sample source, diet, fragment length and microsatellite motif effects on amplification success and genotyping error rates. Conservation Genetics, 8(1), 249-260. doi: 10.1007/s10592-006-9146-5
    [5] Koffi, M., De Meeûs, T., Séré, M., Bucheton, B., Simo, G., Njiokou, F., Salim, B., Kaboré, J., MacLeod, A., Camara, M., Solano, P., Belem, A. M. G. and Jamonneau, V. (2015). Population genetics and reproductive strategies of African trypanosomes: revisiting available published data. PLoS neglected tropical diseases, 9(10), e0003985. doi: 10.1371/journal.pntd.0003985
    [6] Estoup, A., Jarne, P., & Cornuet, J. M. (2002). Homoplasy and mutation model at microsatellite loci and their consequences for population genetics analysis. Molecular ecology, 11(9), 1591-1604. doi: 10.1046/j.1365-294X.2002.01576.x
    [7] De Meeûs, T., Chan, C. T., Ludwig, J. M., Tsao, J. I., Patel, J., Bhagatwala, J., and Beati, L. (2019). Deceptive combined effects of short allele dominance and stuttering: an example with Ixodes scapularis, the main vector of Lyme disease in the USA. bioRxiv, 622373, ver. 4 peer-reviewed and recommended by Peer Community In Evolutionary Biology. doi: 10.1101/622373