De novo chromosome-length assembly of the mule deer (Odocoileus hemionus) genome

Abstract

The mule deer (Odocoileus hemionus) is an ungulate species that is distributed in a range from western Canada to central Mexico. Mule deer are an essential source of food for many predators, are relatively abundant, and commonly make broad migration movements. A clearer understanding of the mule deer genome can improve our knowledge of its population genetics, movements, and demographic history, aiding in conservation efforts. Their large population size, continuous distribution, and diversity of habitat make mule deer excellent candidates for population genomics studies; however, few genomic resources are currently available for this species. Here, we sequence and assemble the mule deer genome into a highly contiguous chromosome-length assembly for use in future research using long-read sequencing and Hi-C technologies. We also provide a genome annotation and compare demographic histories of the mule deer and white-tailed deer using the pairwise sequentially Markovian coalescent model. We expect this assembly to be a valuable resource in the continued study and conservation of mule deer.

The mule deer

Reviewer 2. Dr. Rebecca Taylor

Are all data available and do they match the descriptions in the paper? Yes. I checked the two links included in text as well as the NCBI data availability and all data is available for download with good explanations as to what all the files are.

Are the data and metadata consistent with relevant minimum information or reporting standards?
No. On the whole the information included for the data and metadata is good, but perhaps some more information about the sample used would be beneficial. It is stated that the sample came from 'Woodland Hills, Utah'. I assume this is in the United States? Some more information about the environment would be beneficial for those unfamiliar with the area. It is also not stated which subspecies the sample belongs to. A map of the species range and where this sample is from could also help. Additionally, in the 'Background and context' section, you state that 'genetic resources available for Odocoileus spp. are limited to a variety of microsatellite loci' with the exception from Russell et al. I think you need to do a more thorough search – for example I have used a sitka deer genome (Odocoileus hemionus sitkensis) as an outgroup for my work, sequenced by the CanSeq150 program, found on the NCBI under Bioproject PRJNA476345.

Is the data acquisition clear, complete and methodologically sound? No. I was a bit confused by the sentence 'The assembled mule deer genome has a total length of 2.61 Gbp with a GC content of 41.8% and a contig N50 of 28.6 Mbp (Table 1) with a longest contig of roughly 96.5 Mbs' occurring before the 'Chromosome-length Scaffolding' section. Are these assembly statistics for the version of the genome before chromosome scaffolding? It would be better to report the final assembly statistics for the chromosome scale assembly, or if these are the final statistics then move those results until after the 'Chromosome-length Scaffolding' section. For Table 1, it might be nice to include the L/N90. The most standard are the N/L 50 and the N/L90, so you don’t necessarily need all of the others. Additionally, I could not find where it is stated how many 'chromosomes' in the final assembly. Is this a chromosome assembly or (more likely) a chromosome scale assembly (so there are also other scaffolds not included in the main ‘chromosomes’)? Relevant paper newly published might be good to reference: Yamaguchi et al. Technical considerations in Hi-C scaffolding and evaluation of chromosome-scale genome assemblies. Molecular Ecology. I also think it would be beneficial to know what the coverage of the files you used for the PSMC analysis were, making sure to filter out ~double the average (I can see that you filtered for a maximum of 90X but I don't know whether this is appropriate or not). Also a citation for the generation time used would be beneficial as this strongly influences PSMC results. It would also be good to explain the rise in effective pop size seen recently in the white tailed deer. Is this a real pattern or because PSMC can be spurious at more recent times? This is also a reason why it would be good to know the depth of both files used here. Could the different demographic histories be caused by competition between the species? I am not an expert on these species but I was just curious given their contrasting demographic histories.

Is there sufficient data validation and statistical analyses of data quality? Not my area of expertise

Is there sufficient information for others to reuse this dataset or integrate it with other data? No. As I stated above, information about which subspecies this individual is from in text would be beneficial.

Recommendation: Minor revision

Read the original source

De novo chromosome-length assembly of the mule deer (Odocoileus hemionus) genome

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed