Unbiased whole genome comparison of Pan paniscus (bonobo) and Homo sapiens (human) through a novel sequence match-based approach
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Due to technical and computational limitations, original attempts to compare humans to other non-human primates (NHP) were restricted to specific gene and protein comparisons. With the advances in supercomputing and whole genome sequencing technology, these studies can be revisited to explore entire genomes unbiasedly. A novel alignment-dependent homology algorithm that utilizes a linear search-based approach to find segments of homolog sequences based on a given length of word size, ranging from 32 bp to 1000 bp, was used to perform whole genome comparisons. These sequences were then compared over each chromosome of both the target and control species. Chromosome similarities between Pan paniscus and Homo sapiens varied greatly across various chromosomes. At 32-bp granularity, chromosome 3 showed the highest similarity (91.96%), while chromosome 5 showed the lowest similarity (59.66%). Overall, this indicates that while there are significant similarities in the anatomical structures, physiological structures, and protein similarities, there are significant differences in the genomic code between the two species. Additionally, not all sequences are conserved equally, underscoring the need to study the role that gene duplications, transpositions, and horizontal gene transfer may play in species divergence.
Article activity feed
-
The observation that certain chromosomes, such as chromosome 3, exhibit significantly higher similarity than others, such as chromosome 5, highlights the importance of analyzing chromosome-specific homology rather than relying on averaged genome-wide comparisons. This heterogeneity suggests that different genomic regions have experienced varying rates of evolution and may be subject to different selective pressures. Further investigation is warranted to understand the underlying mechanisms driving these differences. Potential factors could include varying rates of mutation, recombination, gene duplication, transposition, and horizontal gene transfer.
Measuring gene similarity within each Chromosome (the traditional method of detecting relatedness) would be a very strong supplemental figure to establish a baseline of comparison to …
The observation that certain chromosomes, such as chromosome 3, exhibit significantly higher similarity than others, such as chromosome 5, highlights the importance of analyzing chromosome-specific homology rather than relying on averaged genome-wide comparisons. This heterogeneity suggests that different genomic regions have experienced varying rates of evolution and may be subject to different selective pressures. Further investigation is warranted to understand the underlying mechanisms driving these differences. Potential factors could include varying rates of mutation, recombination, gene duplication, transposition, and horizontal gene transfer.
Measuring gene similarity within each Chromosome (the traditional method of detecting relatedness) would be a very strong supplemental figure to establish a baseline of comparison to GeneCompare. How does the previously biased approach compare to this new unbiased approach?
-
he corresponding chromosomes of each species tested against each other are shown in Table 1. “Matched Pairs” represent the numerical amount of total base pairs matched between the two chromosomes, “Total Pairs” is the numerical length of base pairs in the P. paniscus chromosome, and “Percent Ratio” is the ratio between Matched Pairs and Total Pairs, expressed as a percentage.
Because of the high number of match queries, it unlikely that the percent ratios would change much with subsequent runs of GenomeCompare. However to alleviate concerns of algorithm stability, it may be worth running GeneCompare at these three granularities multiple times to add confidence intervals to these ratios.
-
These results indicate that chromosome-to-chromosome comparisons prove more indicative of relatedness than averaged genome-to-genome comparisons
As mentioned, changing the granularity of chromosome comparisons does not perfectly preserve the rank order of relatedness (eg chromosome 16 going from third lowest at 32bp to lowest at 200bp, while chromosomes 1,6,11 remain ranked third, first and second respectively). However, this isn't necessarily "more" indicative of relatedness. Applying CompareGenome to more species (especially with varied evolutionary histories, genetic architectures, mutation rates, etc), seems like the next logical step (as suggested in the discussion section) towards providing evidence for this claim.
-