Maximising informativeness for target capture-based phylogenomics in Erica (Ericaceae)

Seth D. Musker
Nicolai M. Nürk
Michael D. Pirie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Plant phylogenetics has been revolutionised in the genomic era, with target capture acting as the primary workhorse of most recent research in the new field of phylogenomics. Target capture (aka Hyb-Seq) allows researchers to sequence hundreds of genomic regions (loci) of their choosing, at relatively low cost per sample, from which to derive phylogenetically informative data. Although this highly flexible and widely applicable method has rightly earned its place as the field’s de facto standard, it does not come without its challenges. In particular, users have to specify which loci to sequence—a surprisingly difficult task, especially when working with non-model groups, as it requires pre-existing genomic resources in the form of assembled genomes and/or transcriptomes. In the absence of taxon-specific genomic resources, target sets exist that are designed to work across broad taxonomic scales. However, the highly conserved loci that they target may lack informativeness for difficult phylogenetic problems, such as that presented by the rapid radiation of Erica in southern Africa. We designed a target set for Erica phylogenomics intended to maximise informativeness and minimise paralogy while maintaining universality by including genes from the widely used Angiosperms353 set. Comprising just over 300 genes, the targets had excellent recovery rates in roughly 90 Erica species as well as outgroups from Calluna , Daboecia , and Rhododendron , and had high information content as measured by parsimony informative sites and Quartet Internode Resolution Probability (QIRP) at shallow nodes. Notably, QIRP was positively correlated with intron content, while including introns in targets—rather than recovering them via exon-flanking “bycatch”—substantially improved intron recovery. Overall, our results show the value of building a custom target set, and we provide a suite of open-source tools that can be used to replicate our approach in other groups (https://github.com/SethMusker/TargetVet).

Version published to 10.3897/phytokeys.251.136373
Jan 16, 2025
Version published to 10.3897/arphapreprints.e138316
Oct 2, 2024

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Genome-wide discovery, characterization, and validation of novel SSR markers in clove (Syzygium aromaticum) and their application in genetic analysis

This article has 7 authors:
1. Muhammed Azharudheen TP
2. Muhammed Nissar VA
3. Anees K
4. Lijo Thomas
5. Rabisha VP
6. Jayarajan K
7. Sheeja TE
This article has no evaluationsLatest version Dec 19, 2025
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Genome-wide discovery, characterization, and validation of novel SSR markers in clove (Syzygium aromaticum) and their application in genetic analysis

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world