Patterns of aDNA damage through time and environments—lessons from herbarium specimens
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Herbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from 6 plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it.
Results
We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost 8 times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation.
Conclusions
These findings provide insights into how climatic origin, preservation environment, taxonomic identity, and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardized preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.
Article activity feed
-
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium …
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag026), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 3:
I read this work with great interest, and I believe it represents an excellent contribution to our understanding of aDNA preservation, particularly welcome for plants, since most studies in this field are usually carried out on animal tissues, bones, and similar materials. The authors show that ancient DNA (aDNA) damage in herbarium specimens results from a combination of temporal, environmental, and biological factors, with storage conditions affecting decay rates. Their results indicate that DNA fragmentation increases in dry plant tissue with sample age, it varies between genera, and that temperature is the main driver of cytosine deamination. I agree with these interpretations, but the discussion can emphasize more the roles of water and oxidation in DNA degradation. Rapid drying of herbarium specimens limits hydrolytic damage but may increase the oxidative processes, on the contrary, animal or arthropod specimens dry more slowly, andthis allows different degradation dynamics. Considering these differences in the discussion can further clarify the mechanisms behind the observed patterns, especially across museum tissue types.
In the study, the methodologiies vare solid. The approaches used to estimate endogenous DNA content is appropriate, though applying a mapping quality threshold could strengthen the calculation. Methods for assessing DNA fragmentation, for DNA damage, and for decay rates, and 5' C→T substitutions seem robust and oprimal for validating aDNA authenticity. The climate analyses also appear sound but I cannot provide detailed evaluations on this part due to limited expertise in this area.
The explanation for the correlation between fragment length and sample age it seems logical. Unlike animals, where DNA decay occurs in two phases, plant tissue death is instead gradual and diverse depending on tissue, and this allows enzymatic and microbial degradation to continue over longer periods, contributing to the strong age-fragmentation relationship. Overall, the study highlights the importance of tissue type and storage conditions on DNA decay; however discussing how hydrolytic and oxidative processes differ between herbarium plants and other specimen types (animal) would further strengthen the interpretation of the decay rates.
Specific comments
The terminology related to ancient DNA preservation (e.g., DNA damage, DNA degradation, DNA decay) should be clarified and used more consistently throughout the text. These terms describe distinct processes, and specifying the intended meaning for each will improve precision and avoid confusion for the reader. DNA damage refers to specific chemical lesions; DNA degradation describes the physical fragmentation of DNA molecules; and DNA decay refers to the temporal process or rate at which DNA deteriorates over time.
The two most prominent reactions associated with DNA degradation are deamination (resulting with spontaneous substitutions of cytosine residues to uracil) and depurination (breakage of the phosphodiester bond resulting in DNA backbone fragmentation). In view of the comment above on the terminology used, I believe that the sentence above conflates different processes: deamination is a form of DNA damage, whereas depurination leads to DNA degradation through strand fragmentation. I suggest the terminology in the paper should be modified to reflect this distinction. Even if the authors do not wish to adopt this terminology I suggest that they clarify the terms more clearly at the beginning.
Line 106: …six plant species, spanning…
Line 98, 105: In this context, it is not appropriate to refer to deamination-induced substitutions as "mutations," since they represent post-mortem chemical damage rather than random biological changes (mutations) that occurr in vivo. In addition, introducing this new term complicates even more the terminology presented in the previous comments.
Line 116-118: I wonder if the sampling coverage for Hordeum, with highest counts in arid and warm regions, may be incomplete, as certain regions, such as northern Europe (e.g., Scandinavia) or Russia are not represented. These species are cultivated in Russia, Denmark, southern Sweden, I believe. Should this limitation be acknowledged as it could affect the generality of the conclusions especially regarding temperatures?
It is unclear why the study included only wild Oryza species (O. alta, O. grandiglumis, O. latifolia, O. rufipogon), whereas for Hordeum the cultivated Hordeum vulgare was used. Perhaps, including Oryza sativa can provide more information on DNA preservation in domesticated material and allow a more consistent comparison across genera?
Table 1: Draw a line above the last row (Total)
Line 140: Oryza should be in italics
Line 140: why 58 Oryza (30 O. latifolia, 18 O. rufipogon and 10 O. grandiglumis)? Why not all Oryza samples.
From line 169, it appears that an additional 287 Oryza samples from different origins (KAUST) were used, but it is not clear (not explained) if these are herbarium specimens, and why this origin (KAUST) is not included in Table 1. Perhaps it would be better to explain at the beginning of this paragraph that there are two subsets of samples and to clarify the content of Table 1 more clearly.
Line 143: it is not specified which part of the herbarium material was used. I assume leaves, but this should be clearly stated
Line 149: Please clarify what "gDNA" refers to; genomic DNA? Since you spell out "genomic DNA" elsewhere in the paragraph, the abbreviation here seems unnecessary.
Line 149: Why was only a subset used? Please explain and provide a rationale.
Line 154: were the libraries constructed only on this subset as well?
Line 162: Fragment size: The first letter of the sentence should be capitalized.
Lines 165-169: It is not clear for me how the different subsets of samples were used in this study. Here it is stated that all barley samples (but how many exactly?) were sequenced on NovaSeq in a specific place, whereas only 40 rice samples (from the initial subset of how many?) were sequenced on another NovaSeq platform and at a different institute. Also, the 287 samples from KAUST are seqeunced on a MiSeq that has lower output compared to NovaSeq. Somehow, it is necessary to explain how the initial 573 samples were selected and used for all analyses. Also, the 287 samples from KAUST were processed in an ancient DNA lab, but what about all the other samples? It would be strange if a specialized laboratory for ancient DNA analyses was not used for all samples. In this regard, it should also be noted that the issue of contamination is not mentioned in the manuscript, although it was certainly considered by the authors; for example, by indicating whether negative controls (blank samples) were used and how they were processed. Certainly, the C>T signal ensures that we are dealing with authentic ancient sequences, but this should be highlighted and explained more clearly.
Line 189: Why was it aligned to Oryza glumipatula (a new species not mentioned before?) and not against Oryza rufipogon? The authors report measuring gDNA fragment size distributions on a subset of 40 samples. It would be helpful if they could provide a motivation for why this subset was chosen, and how it is representative of the full dataset, to clarify the rationale behind not analyzing all samples.
-
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium …
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag026), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 2:
Reproducibility report for: Patterns of aDNA Damage Through Time end Environments - lessons from herbarium specimens Journal: Gigascience ID number/DOI: GIGA-D-25-00447 Reviewer(s): Laura Caquelin, Department of Clinical Neuroscience, Karolinska Institutet, Sweden [Wrote the report and reproduced the results] Gustav Nilsonne, Department of Clinical Neuroscience, Karolinska Institutet, Sweden [Reviewed the final report]
- Summary of the study The authors evaluated DNA preservation in herbarium collections by analyzing genomic data from 573 specimens of Hordeum and Oryza. They quantified DNA degradation and identified factors affecting decay, finding that specimen age and environmental conditions strongly influence DNA preservation.
- Scope of reproducibility
According to our assessment the primary objective is: the regression analyses of aDNA damage metrics for Hordeum and Oryza.
Outcome: "Four metrics were selected to quantify patterns of aDNA damage: (i) the proportion of endogenous DNA content, (ii) the fragment length distribution, (iii) the damage fraction per site (λ), and (iv) the frequencies of 5' C>T substitutions." (lines 197-199)
Analysis method outcome: "The four metrics were analysed in linear models as a function of collection year and sample age using the 'lm' function in R" (lines 199-200)
Main result: The results of this outcome are presented in figure 2 "Regression analyses of aDNA damage metrics for Hordeum and Oryza" and in the related text lines 302 to 361 in the "Regression analysis" section: "Endogenous fraction […] The regression analyses revealed no statistically significant relationship between the proportion of endogenous DNA and the sample collection year in Hordeum (R2 = 0.003, p = 0.451, N = 211), but a very weak yet significant relationship was observed in Oryza (R2 = 0.04, p = 0.00167, N= 245; figure 2a).
Fragment length […] We observed a statistically significant relationship between the log-mean fragment length and the sample collection year for both genera (figure 2b), with a stronger relationship for Hordeum (R2 = 0.27, p =5.33 x 10-16, N=211) than Oryza (R2 = 0.112, p = 8.58 x 10-8, N= 245).
Damage fraction per site (λ) and DNA decay rate (k) […] We estimated the DNA decay rate per year (k) for Hordeum and Oryza from the slope of the linear relationship between λ and sample age (figure 2c). We observed a per nucleotide decay rate of k= 2.64 x 10-4 per year for Hordeum (R2 = 0.208, p =3.27 x 10-12, N= 211), which was 1.5 times faster than the decay rate of Oryza of k= 1.79 x 10-4 per year (R2 = 0.101, p = 3.65 x 10-7, N= 245) […].
Nucleotide misincorporations […] (figure 2d), with Oryza starting from a higher baseline of damage when compared to Hordeum and displaying a stronger relationship (R2 = 0.303, p = 8.62 x 10-21, N= 245 for Oryza, and R2 = 0.207, p =3.63x 10-12, N= 211 for Hordeum, respectively). […]"
- Availability of Materials a. Data
- Data availability: Raw data are not yet publicly available but uploaded in NCBI database. Processed data are shared via the private journal dropbox, and the intermediate file is available on the GitHub repository.
- Data completeness: Complete processed data and intermediate file (all data necessary to reproduce main results are available).
- Access Method: Private journal dropbox and GitHub repository
- Repository: https://github.com/Stefano-Porrelli/Herbaria_aDNA_Damage -Data quality: Structured b. Code
- Code availability: Open
- Programming Language(s): R and Bash
- Repository link: https://github.com/Stefano-Porrelli/Herbaria_aDNA_Damage
- License: MIT license
- Repository status: Public
- Documentation: Clear Readme file. Additional details may be required to run the Bash code.
- Computational environment of reproduction analysis
- Operating system for reproduction: MacOS 15.7.2
- Programming Language(s): R
- Code implementation approach: Using shared code
- Version environment for reproduction: R version 4.5.1/RStudio 2025.05.1
- Results
5.1 Original study results
- Results 1: See screenshot figure 2:
5.2 Steps for reproduction
-> Run 01_Plant_aDNA_screening_prep.sh
- Issue 1: The reviewer link provided for the bioprojects on NCBI did not allow downloading. -- Partial resolution: An email was sent to the authors requesting access to the raw data or sharing processed data and intermediate files. Processed data were shared via the private journal dropbox and intermediate file (aDNA_damage_screening_MAIN.txt) was shared both on the dropbox and the GitHub repository.
The authors contacted NCBI to enable downloading the raw data with the reviewer link, but no response has yet been received. As the review needed to be performed within a set timeframe, the computational reproducibility review was performed first using the processed data and then directly with the intermediate file.
Note: The two bash scripts were not run. Additional guidelines would be helpful for running these scripts, especially regarding terminal commands and manual steps (changing the repository name or the link to the data for example).
-> Run the analysis from the processed data shared --> Run code aDNA_Dmg_Script00_collate_screening_results.r
- Issue 2: The code expects data organized in two sub-folders: 4_mapping and 5_aDNA_characteristics. Processed data were received in several species-specific folders, each containing 4_mapping and 5_aDNA_characteristics. -- Resolved: All data were merged manually into single 4_mapping and 5_aDNA_characteristics folders to match the script's requirements. This detail should be added to the readme file.
- Issue 3: The sample_metadata.txt file was not correctly merged with the results dataframe. Many columns (Batch_no to X) in aDNA_damage_screening_MAIN.txt contained NA values. -- Resolved: A message was sent to the authors to resolve the issue. They updated both sample_metadata.txt and aDNA_damage_screening_MAIN.txt on GitHub. Author's response: "I have realised the problem stems from inconsistencies between sample naming conventions in the screening output directories and the sample identifiers in the metadata file. Specifically, for the Hordeum samples, the directories are named using library IDs rather than the short sample names, and some of the Oryza samples were missing their expected suffixes. This meant the left_join step failed to match metadata for those samples. Thank you for flagging this up. I have now corrected this by updating the "Sample" column in the metadata file to reflect the actual directory names used in the screening outputs. The original short names are preserved in a "Sample_ID" column. I have uploaded the corrected sample_metadata.txt file to the GitHub repository, and also updated the aDNA_damage_screening_MAIN.txt dataset on the GitHub repo to reflect these changes. I have re-run the pipeline and it now works correctly. Please let me know if you encounter any further issues, and thank you again for catching this."
The reproduced aDNA_damage_screening_MAIN.txt file no longer contains NA values.
--> Run code aDNA_Dmg_Script02_Regressions.r: The script was run without any issues.
-> Run the analysis from the intermediate data file shared on Github --> Run code aDNA_Dmg_Script02_Regressions.r: Run the code after renaming the file to aDNA_damage_screening_MAIN_shared.txt.
5.3 Statistical comparison Original vs Reproduced results
Reproduced results: -- Using the processed data and the reproduced aDNA_damage_screening_MAIN.txt, the results of Figure 2 were successfully reproduced (see screenshots below). -- Using the shared aDNA_damage_screening_MAIN.txt from GitHub, the results were also successfully reproduced (see screenshots below).
Comments: Supplementary Figure 1 was also reproduced using the same code. We confirmed that the reproduced values match the original results. Both the processed data and the intermediate data file reproduced Supplementary Figure 1 (see screenshots below).
Errors detected: One reporting error was detected in the "Fragment length" section (line 336): the p-value for Oryza should be 8.47 x 10-8, not 8.58 x 10-8 as reported in the text.
Statistical Consistency: All statistical results reproduced from both the processed data and the intermediate file are identical to those reported in the manuscript (see Comparison_reproduced_vs_original.csv and Comparison_two_reproductions.csv files attached with this report).
- Conclusion
Summary of the computational reproducibility review The computational reproducibility review shows that the results in Figure 2 and related text of the original study were fully reproducible using both the processed data and the intermediate data file shared (aDNA_damage_screening_MAIN.txt). The statistical results reproduced are identical to those presented in the manuscript. One minor reporting error was detected in the manuscript: the p-value for Oryza in the "Fragment length" section should be 8.47 × 10⁻⁸ instead of 8.58 × 10⁻⁸.
Recommendations for authors -- Provide clearer instructions for running the Bash scripts, including terminal commands and any manual steps. -- Ensure consistent sample naming across metadata files and data directories to avoid merging issues for all analysis/scripts. -- Consider making raw data publicly available or provide clear guidance for reviewers to access it. -- Maintain clear documentation of file structure to facilitate future reproducibility.
-
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium …
AbstractHerbarium collections are a vast but underutilized resource for ancient DNA research, containing over 400 million specimens with detailed metadata and spanning centuries of global biodiversity. Understanding patterns of DNA preservation in natural collections is crucial for optimizing ancient DNA studies and informing future curation practices. We analysed genomic data for 573 herbarium specimens from six plant species from the genera Hordeum and Oryza collected from the Americas and Eurasia over 220 years. Using standardized laboratory protocols and shotgun sequencing, we quantified DNA degradation and elucidated factors that accelerate it. We find significant age-dependent DNA fragmentation rates, indicating temporal degradation processes not detected in prehistoric samples. In our analysis, DNA decay rates in herbarium specimens were almost eight times faster than in moa bones, reflecting fundamental differences in tissue composition and preservation environments. Environmental conditions at the time of specimen collection emerged as the major determinants of post-mortem damage rates, with the interaction term between temperature and genus being the dominant driver of cytosine deamination. We find no effect of sample storage on DNA damage and degradation. These findings provide insights into how climatic origin, preservation environment, taxonomic identity and age influence DNA preservation while highlighting opportunities for improving institutional preservation practices. Due to standardised preservation conditions, museum collections can provide better insights into DNA damage and degradation over time than archaeological and paleontological samples.
This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giag026), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:
Reviewer 1:
The manuscript by Stefano Porrelli and colleagues make a valuable contribution by scaling up previous work on DNA damage in plant herbarium specimens and by exploring how collection environments influence patterns of aDNA degradation. The authors present a large-scale analysis of DNA damage in 573 specimens from six Hordeum and Oryza species spanning ~220 years and diverse climates. Using standardized ancient DNA protocols, shotgun sequencing, and high-resolution climate data, they model the effects of specimen age, collection environment, genus, and herbarium of origin on DNA fragmentation, decay rates, and cytosine deamination.
The study robustly confirms that DNA fragmentation and λ are strongly age-dependent, that herbarium specimens exhibit decay rates intermediate between bones and arthropods, and that environmental factors (particularly temperature) appear to correlate with 5′ C→T damage when all samples are analysed together. At the same time, some aspects of the temperature interpretation, especially in relation to genus-level structure, merit further clarification (as detailed below). Storage conditions (herbarium identity) seem to have comparatively minor influence.
Overall I enjoyed reading this research, the dataset is rich, the methodological framework is strong, and the work has significant potential to become a reference for understanding plant aDNA preservation in herbaria. I believe the paper merits publication, though several concerns should be addressed prior to its acceptance. Please, find bellow several points that I hope will help strengthen and refine the manuscript.
Major comments
Definition and calculation of endogenous DNA fraction
You define endogenous fraction as "the percentage of post-quality trimmed and merged reads for each sample mapped to its respective reference" (lines 203-206) and say it "was calculated with SAMtools 'flagstat'" (line 206) However, this is somewhat ambiguous:
Is the denominator the number of merged reads after AdapterRemoval, the total raw reads, or only non-duplicate mapped reads?
Do you include secondary/supplementary alignments (multi mappers), and how are PCR duplicates treated here?
Given that endogenous fraction is one of your four key metrics (Methods, lines 197-200), it would be useful to make this completely explicit.
Need a better explanation of the "month of collection" variable
Lines 266-273: you state that monthly temperature and precipitation were extracted "to infer climatic conditions at the time of specimen collection" and that in the collection climate model variables were assigned "based on their location and month of collection." Later, in the Results you again refer to "collection climate" and "annual climate" models (lines ~438-441).
However, it is not entirely clear whether month is explicitly included as a variable (e.g. as a categorical factor or via the corresponding monthly raster) or whether you simply used the CHELSA monthly layer corresponding to the recorded month? Please clarify in the Methods how the month of collection enters the model. Is there a variable "month" per se, or is the only effect that you choose the relevant tas_XX and pr_XX layer?
This would make it much easier for readers to follow how "month" is used and what the collection climate actually represents.
Need a clarification of "Collection Climate" vs. Herbarium Storage
In the Methods (lines ~271-274), you describe a collection climate model where "monthly climatic variables (temperature and precipitation) were assigned to samples based on their location and month of collection," and an annual climate model based on annual means at the collection location. However, it is not clearly stated how this model relates to the actual time each specimen spent in the field vs. in herbarium storage. By definition, a 150-year-old specimen will have spent the majority of its lifetime in a collection, yet the climate used in the models is that of the collection locality at the time of sampling, not the climate of the herbarium building where it spent decades, despite the herbarium being included as a factor.
Could you please clarify explicitly what period of a specimen's "life after death" you intend to capture with the collection climate model? Is it mainly the drying/early post-mortem period, or are you also considering longer-term storage conditions in the herbarium?. Do you assume that most deamination and oxidative damage occur in the first days to months after collection, and that later storage in relatively stable herbarium conditions contributes little to further degradation?
Need for the integration of non-deamination mismatch controls and baseline divergence
Your analysis focuses on the aDNA-typical 5′ C→T misincorporations (Methods, lines 238-245; Results, lines 355-361). However, you do not show any other mismatch frequencies (e.g. A→G, G→A etc) as a "negative control" to demonstrate that the patterns you report (exponential decay, climate, age, genus effects) are specific to deamination rather than general elevation of error rates or mapping artefacts.
On that specific point, Lines 622-624 and 651-653: You attribute the higher 5′ C>T frequencies in Oryza to greater susceptibility to post-mortem deamination, potentially linked to its tropical and sub-tropical distribution. However, because Oryza originates from consistently warmer regions while Hordeum is predominantly temperate, genus and temperature are strongly confounded in your dataset. This is also supported by your own variance partitioning analysis, where large shared variance fractions (temperature × genus) indicate that these two predictors are difficult to disentangle.
Furthermore, Figure 6 shows that when analysing each genus separately, the relationship between either annual mean temperature or collection temperature and 5′ C>T frequencies is no longer significant. This suggests that the global temperature-damage correlation you report is largely driven by genus-level differences rather than temperature acting independently or am I wrong ? Otherwise could you add a bit of discussion on that point to explain why if temperature does have an impact of deamination, why do we not see this intra-genus with different temperature values?
While I agree that environmental conditions at the time of collection may influence DNA degradation, another factor that could contribute to the observed genus-specific patterns is reference-read divergence. Indeed, in a recent unreviewed work (see preprint: https://doi.org/10.1101/2025.07.16.665190), showed that the percentage identity between the reference genome and the ancient reads can influence apparent damage estimates. Although divergence between the ancient Hordeum/Oryza reads and their respective references is unlikely to be extreme given that plants do not evolve as rapidly as microbial taxa, a sanity check (e.g., adding the percentage average identity of each species per genus in the model) would help confirm that reference mismatch is not inflating differences in estimated 5′ C>T frequencies between genera.
Minor comments
Title : "Patterns of aDNA Damage Through Time end Environments" → "Time and Environments."
Line 95 - ex situ in italic.
Line 140 and elsewhere: Oryza should be in italics whenever used as a genus (same for Hordeum).
Line ~551: "extremally well-preserved samples" → "extremely well-preserved samples."
It may help to add one sentence acknowledging that classical laboratory negative controls (blank extractions) are not relevant to the regression models, but that misincorporation spectra and MapDamage profiles effectively serve as authenticity checks (Methods, lines 176-187 and 238-247).
Discussion lines 641-648 compare herbarium specimens to bones and arthropods. It might help the reader if you add one explicit sentence summarizing why age-fragmentation relationships are detectable in herbaria but not in bones (standardized post-collection environment, as you nicely explain in lines 595-603).
In Figure 6, consider adding a brief note in the legend stating that the strong relationship in panels a-b is largely driven by contrasting climates and baseline damage between genera, and that it disappears within genera (c-d). This would remind readers of the confounding you discuss in the text.
In the Methods you state that you used linear models (lm) for regressions and varpart + rda for variance partitioning (lines 197-201 and 269-281). While the overall approach is reasonable, it would help to briefly address whether model assumptions (normality, homoscedasticity) were checked for the linear regressions (e.g. on log-transformed variables).
While the manuscript mentions storage effects in the discussion, it doesn't explore them in great detail. More focus on specific herbarium storage methods (e.g., temperature, humidity control) might help contextualize the minor storage effects observed. A brief section or discussion on institutional preservation practices and their variability could provide readers with more context about herbarium differences.
-
-
