Widespread of horizontal gene transfer events in eukaryotes

This article has been Reviewed by the following groups

Read the full article

Listed in

Log in to save this article

Abstract

Horizontal gene transfer (HGT) is the transfer of genetic material between distantly related organisms. While most genes in prokaryotes can be horizontally transferred, HGT events in eukaryotes are considered as rare, particularly in mammals. Here we report the identification of HGT regions (HGTs) in 13 model eukaryotes by comparing their genomes with 824 eukaryotic genomes. Between 4 and 358 non-redundant HGTs per species were found in the genomes of 13 model organisms, and most of these HGTs were previously unknown. The majority of the 824 eukaryotes with full length genome sequences also contain HGTs. These HGTs have transformed their host genomes with thousands of copies and have impacted hundreds, even thousands of genes. We extended this analysis to ~128,000 prokaryote and virus genomes and revealed a few potential routes of horizontal gene transfer involving blood sucking parasites, intracellular pathogens, and bacteria. Our findings revealed that HGTs are widespread in eukaryotic genomes, and HGT is a ubiquitous driver of genome evolution for eukaryotes.

Article activity feed

  1. . For instance, among the 313 non-redundant HGT trees for104Homo sapiens, Pan troglodytes was found in 312 of them, therefore the HGT-appearance number105NHP between Homo sapiens and Pan troglodytes was 312.

    This is still fairly confusing and I'm not sure what it means

  2. Widespread of HGTs among eukaryot

    Did you do anything to deal with contamination? Contamination is fairly widespread, even in refseq genomes, and might lead to unexpected results.

  3. Functional annotation for genes overlapping with HGTs (see Methods) revealed some232significantly enriched Gene Ontology terms (GO terms) (Bonferroni<0.05) for protein-coding genes233from mouse, fruit fly and nematode as well as non-coding genes from yeast. (Table S11). The234significant GO terms for nematode were “hemidesmosome, intermediate filament”, while the235significant GO term for mouse was “protein kinase A binding”. HGTs in fruit fly that overlapped236with coding genes were enriched for “ATP binding, lipid particle, microtubule associated complex”,237etc. HGTs in yeast overlapped with non-coding genes enriched for “retrotransposon nucleocapsid,238transposition, RNA-mediated, cytosolic large ribosomal subunit”, etc.

    shouldn't this be a part of the results section?

  4. We further evaluated the pipeline with a genome containing simulated HGT regions. Since our78HGT identification pipeline has two main steps, sequence composition-based filtering step and79genome comparison step. The evaluation was done for the two steps (Figure S3, Table S1). While80top 1% fragments were input to the pipeline, 20.6% correct results would be identified after81sequence composition-based filtering and 14.3% correct results identified after genome comparison.82When the percentage of fragments input was up to 50%, 83.4% and 77.7% correct results were83identified after two steps respectively. It can be seen that the precision of prediction was higher than8460% for all cases. This indicated that we may have underestimated the number of HGTs (low recall85rate) but majority of the identified HGTs were highly reliable.

    This paragraph was a bit confusing to follow but I think I got the gist of it after a few passes through! I'm curious if you thought about controlling for natural variation in 4mer frequency throughout the genome, as some other methods have found that this helps reduce off target predictions (reviewed in https://doi.org/10.1371/journal.pcbi.1004095). It may not be necessary since you do a second step after the initial screen, but I was just curious if that was something you thought about putting in place, and if so, why you decided against it

  5. non-redundan

    Would you be willing to provide a more clear definition of non-redundant here? does this mean there are no paralogs of the gene? or the HGT only occurred in one model org? or only one genome of all of the 824+13 that you investigated?

  6. . The copy number of each HGT was determined from the number of407merged HGT copies

    Are all of these long read genomes? If not, will this be an unreliable estimate?

  7. 1000-bp segments with 200-bp

    How did you assess these numbers? in metagenome binning, 1kb isn't large enough to get confident estimates of tetramernucleotide frequency; you often need > 2500 bp.

  8. . The copy number of each HGT was determined from the number of407merged HGT copies

    Are all of these long read genomes? If not, will this be an unreliable estimate?

  9. 1000-bp segments with 200-bp

    How did you assess these numbers? in metagenome binning, 1kb isn't large enough to get confident estimates of tetramernucleotide frequency; you often need > 2500 bp.

  10. Functional annotation for genes overlapping with HGTs (see Methods) revealed some232significantly enriched Gene Ontology terms (GO terms) (Bonferroni<0.05) for protein-coding genes233from mouse, fruit fly and nematode as well as non-coding genes from yeast. (Table S11). The234significant GO terms for nematode were “hemidesmosome, intermediate filament”, while the235significant GO term for mouse was “protein kinase A binding”. HGTs in fruit fly that overlapped236with coding genes were enriched for “ATP binding, lipid particle, microtubule associated complex”,237etc. HGTs in yeast overlapped with non-coding genes enriched for “retrotransposon nucleocapsid,238transposition, RNA-mediated, cytosolic large ribosomal subunit”, etc.

    shouldn't this be a part of the results section?

  11. Widespread of HGTs among eukaryot

    Did you do anything to deal with contamination? Contamination is fairly widespread, even in refseq genomes, and might lead to unexpected results.

  12. . For instance, among the 313 non-redundant HGT trees for104Homo sapiens, Pan troglodytes was found in 312 of them, therefore the HGT-appearance number105NHP between Homo sapiens and Pan troglodytes was 312.

    This is still fairly confusing and I'm not sure what it means

  13. non-redundan

    Would you be willing to provide a more clear definition of non-redundant here? does this mean there are no paralogs of the gene? or the HGT only occurred in one model org? or only one genome of all of the 824+13 that you investigated?

  14. We further evaluated the pipeline with a genome containing simulated HGT regions. Since our78HGT identification pipeline has two main steps, sequence composition-based filtering step and79genome comparison step. The evaluation was done for the two steps (Figure S3, Table S1). While80top 1% fragments were input to the pipeline, 20.6% correct results would be identified after81sequence composition-based filtering and 14.3% correct results identified after genome comparison.82When the percentage of fragments input was up to 50%, 83.4% and 77.7% correct results were83identified after two steps respectively. It can be seen that the precision of prediction was higher than8460% for all cases. This indicated that we may have underestimated the number of HGTs (low recall85rate) but majority of the identified HGTs were highly reliable.

    This paragraph was a bit confusing to follow but I think I got the gist of it after a few passes through! I'm curious if you thought about controlling for natural variation in 4mer frequency throughout the genome, as some other methods have found that this helps reduce off target predictions (reviewed in https://doi.org/10.1371/journal.pcbi.1004095). It may not be necessary since you do a second step after the initial screen, but I was just curious if that was something you thought about putting in place, and if so, why you decided against it