Genomic Resources for the North American Water Vole (Microtus richardsoni) and the Montane Vole (Microtus montanus)

Abstract

Voles of the genus Microtus are important research organisms, yet genomic resources are lacking. Such resources would benefit future studies of immunology, phylogeography, cryptic diversity, and more. We sequenced and assembled nuclear genomes from two subspecies of water vole (Microtus richardsoni) and from the montane vole (Microtus montanus). The water vole genomes were sequenced with Illumina and 10× Chromium plus Illumina sequencing, resulting in assemblies with ∼1600,000 and ∼30,000 scaffolds, respectively. The montane vole was also assembled into ∼13,000 scaffolds using Illumina sequencing. Mitochondrial genome assemblies were also performed for both species. Structural and functional annotation for the best water vole nuclear genome resulted in ∼24,500 annotated genes, with 83% of these having functional annotations. Assembly quality statistics for our nuclear assemblies fall within the range of genomes previously published in the genus Microtus, making the water vole and montane vole genomes useful additions to currently available genomic resources.

Abstract

This work has been peer reviewed in GigaByte, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

**Reviewer 1: Aaron Shafer ** Is there sufficient detail in the methods and data-processing steps to allow reproduction?

No. I would include all flags for assemblies even if default; unclear how the 10x + Illumina data were integrated (if at all) - see comments below.

Is there sufficient data validation and statistical analyses of data quality?
Yes. I suppose BUSCO and gene number is a form of validation.

Is there sufficient information for others to reuse this dataset or integrate it with other data?
No. See comment below; while the short-read data is great, the genomic resource I likely would reassemble for a variety of reasons outlined in Additional Comments.

Any Additional Overall Comments to the Author: The paper is well written, and I have no comments about the the content - well done here. My main concern lies with the genome resources - and in this case I would likely use the raw data, rather than the assemblies provided. I offer my rationale and suggestions:

My lab was heavily pushed by a colleague towards the use of Meraculous in our short-read assembly of mammal genomes ( https://jgi.doe.gov/data-and-tools/meraculous/ ) ; this is because it’s really designed for short-read assemblies of big genomes (i.e. no addition of mate-pair) AND it performs very well in the Assemblathon metrics https://academic.oup.com/gigascience/article/2/1/2047-217X-2-10/2656129 - notably Figure 16-18 you start to see clear differences between meraculous and say soapdenovo. Thus for just the Illumina data I would very much like to see a more appropriate assembly explored as stats like N50 and no. scaffolds will likely improve considerably with the appropriate methods.

Likewise, it’s very unclear in the methods how M. r. arvicoloides was assembled: I see SUPERNOVA for the 10X data (great), and probably soapdenovo for the Illumina data (see above). But how were they combined? This sequencing strategy is really designed for a hybrid assembly (see for example DGB2OlC https://github.com/yechengxi/DBG2OLC) this is appropriate for 10X data and really does work! But there are others.

Note M. agretus that has an identical sequencing strategy to *M. r. arvicoloides *almost has ~3% the total scaffolds – follow whatever they did! And I will say, while the authors state their genome is on par with other Microtus, this appears true by Table 3, only M. agretus currently has an assembly that I think is at current standards. The level of fragmentation and low BUSCO scores really support re-visiting the assembly suggestions, as I think the current .fasta will be of limited utility in a population or comparative genomics study.

The gene number is pretty high for a mammal and I worry that’s due to fragmentation. It would be reasonably to only annotate scaffolds >10Kb or 50KB, but then there’s not much of a genome left. Ideally the bulk of your genome (>>90%) would fall on these scaffolds. There is really no sense annotation your small fragments (have you tested for contamination? Note NCBI will do this before allowing for it to be deposited so I suggest it).

You also align your data to mt genome, this is different than assembling it. You could assemble it (e.g. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1927-y) and that might be interesting to see if there any differences

I wish I could be more positive; an assembly like Mercaculous would take a week or so, and so would the hybrid approach, but would be worth it based on my experience with these data.

Recommendation: Major Revision

Reviewer 2. Joana Damas.

Any Additional Overall Comments to the Author:
The genomes presented in these work will be extremely valuable tool for Microtus related research. The manuscript is very clear and easy to follow. I have, however, a couple of comments that I hope will further improve it.

(1) Line 123: I believe more details on the measures used for the selection of the best M. r. macropus are needed. Even though the contiguity of the Discovar genome assembly is higher than the ones generated with SOAPdenovo, the BUSCO score is relatively low (54.5% versus 84% in M. r. arvicoloides, e.g.). Were the BUSCO scores for the other assemblies even lower? Is the Discovar assembly size closer to the estimated genome size?

(2) Line 131/251: Was there any genome structure verification step for the M. montanus genome assembly? For instance, which percentage of the Illumina reads could be mapped back to the finished genome assembly?

(3) Line 131/251: Was there a reason not to use a published reference-guided assembly method (e.g. RaGOO and those listed therein) for the assembly of M. montanus genome? These could maybe further improve the assembly or help identify misassemblies. (4) Line 180: the high difference between BUSCO scores for each M. richardsoni subspecies makes me believe that the completeness of the genomes is quite different and the fraction of the genome within repeats might be underrepresented in M. r. macropus and that the subspecies values might be closer than noted here. It is, however, difficult to depict phylogenetic relatedness from Fig. 1 for the other species, for non-experts as myself. It would be helpful to have a phylogeny next to the graph showing species relationships. (5) Please verify Tables 1 and 2. The statistics presented for M. r. macropus do not match for N50 and longest scaffold size.

Recommendation: Minor Revision

Read the original source

Genomic Resources for the North American Water Vole (Microtus richardsoni) and the Montane Vole (Microtus montanus)

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed