Assessing the potential of ancient protein sequences in the study of hominid evolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Palaeoproteomic data can provide invaluable insights into hominid evolution over long timescales. Yet, the potential and limitations of ancient protein sequences to resolve evolutionary relations between species remains largely unexplored. In this study, we aim to quantify how much information about these relations can be obtained from limited ancient protein data, at the scale that is currently available or will be available in the near future. We harness sequence alignments of 12 enamel and collagen proteins that have been previously reported in fossil material that is at least 1 million years old. We utilise in silico translations of hominid DNA sequences of these proteins and highlight their differential sequence conservation, indicating some of them contain much larger amounts of information than others. We also evaluate the extent to which inferred topologies from protein data differ from inferred topologies from the more informationally-dense DNA data. We show that the former may sometimes lead to inferences of the wrong tree topology due to the informational loss that comes when working with peptide data. Additionally, we determine the number of concatenated proteins necessary to confidently reconstruct the population / species tree summarizing the relations between humans, chimpanzees and gorillas, as well as those between modern humans, Neanderthals and Denisovans. As expected, increasing the number of proteins in a concatenation enhances resolution, but we note that trees inferred from the full set of collagen and enamel proteins do not necessarily correspond to population trees inferred from genome-wide data, especially in closely related groups. Our study underscores the potential and limitations of utilising palaeoproteomic data in deep time phylogenetic reconstructions, indicating that these will be aided not only by increased recovery of proteins in the future, but also by more careful modeling of evolutionary relations across the genome, beyond simply building single phylogenetic trees.