First chromosome-level genome assembly of the colonial tunicate Botryllus schlosseri

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Botryllus schlosseri (Tunicata) is a colonial chordate that has long been studied for its multiple developmental pathways and regenerative abilities and its genetically determined allorecognition system based on a polymorphic locus that controls chimerism and cell parasitism. We present the first chromosome-level genome assembly from an isogenic colony of B. schlosseri clade A1 using a mix of long and short reads scaf-folded using Hi-C. This haploid assembly spans 533 Mb, of which 96% are found in 16 chromosome-scale scaffolds. With a BUSCO completeness of 91.2%, this complete and contiguous B. schlosseri genome assembly provides a valuable genomic resource for the scientific community and lays the foundation for future investigations into the molecular mechanisms underlying coloniality, regeneration, histocompatibility, and the immune system in tunicates.

Article activity feed

  1. Botryllus schlosseri (Tunicata) is a colonial chordate that has long been studied for its multiple developmental pathways and regenerative abilities and its genetically determined allorecognition system based on a polymorphic locus that controls chimerism and cell parasitism. We present the first chromosome-level genome assembly from an isogenic colony of B. schlosseri clade A1 using a mix of long and short reads scaf-folded using Hi-C. This haploid assembly spans 533 Mb, of which 96% are found in 16 chromosome-scale scaffolds. With a BUSCO completeness of 91.2%, this complete and contiguous B. schlosseri genome assembly provides a valuable genomic resource for the scientific community and lays the foundation for future investigations into the molecular mechanisms underlying coloniality, regeneration, histocompatibility, and the immune system in tunicates.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf097), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 3: Cristian Canestro

    TO THE AUTHORS

    In this MS entitled 'First chromosome-level genome assembly of the colonial chordate model Botryllus schlosseri (Tunicata)', Olivier De Thier and colleagues report the first chromosome-scale assembly of this colonial ascidian specie, paying special attention to differences with previous published assemblies and importantly between haplotypes. The MS is very well written, very easy and pleasant to read. This provides data of great quality and very relevant not only for the ascidian/tunicate community, but to the field of genome structural evolution. I firmly recommend it for publication, although I think that the authors could discuss it in deeper detail. Specially, I miss for instance a more elaborate discussion of the results in our understanding of the similarities and differences between clades that have been published in the last years (I have not been able to find some relevant articles in this regard cited in the bibliography). I also feel that a deeper analysis of the differences between haplotypes could be very interesting, unless they are artifactual effects of the assemblies. As mentioned below, unless this is part of a longer story for a different MS beyond the scope of this one, I encourage the authors to validate some of the differences they find between haplotypes, and try to correlate the structural variations, with differences in gene counts between haplotypes, and to explore whether these differences could be correlated with aspects of biological relevance. I miss, for instance, Venn diagrams with gene contents between previous assemblies, and the haplotypes/haploid genome here reported. In any case, I firmly recommend this MS for publications, since most of my suggestions are not intended to interrogate the results of the MS, but to improve it, but I also understand that some may go beyond the scope of this MS.

    Minor points: Introduction Page 1: "the basic body plan of adult tunicates is highly conserved across the entire subphylum [3]". This sentence, which could be OK for ascidians, probably provides a highly simplified vision of Tunicate adult morphologies, specially comparing the divergent morphologies of Thaliaceans and Appendicularians. Please, elaborate the sentence.

    To understand the comparisons between the data of this MS and previously reported genomes, it seems crucial to understand well the meaning of the "clades and subclades". Please, include in the introduction (or where needed), how are defined those clades, which are their origins and biological/geographical differences, … and all the critical information that will specially help non-tunicate readers to understand the results.

    Results: The authors refer to the presence of large-scale genomic palindromes in Bs1 and Bs3. But it is unclear what are these structures. I suggest to please provide some more detailed explanation about the palindromic nature of these regions.

    The data of haplotype-resolved assemblies is very interesting. I wonder if it is possible to somehow measure the amount of heterozygosity between haplotype 1 and 2, and those versus the previous versions of the genome, to better understand intra and inter-variation between subclades? The differences of the size of some regions between Colombera and this study, and even between haplotypes 1 and 2, are very interesting. I would find more informative to merge the three graphs of Figure S9 into one single graph, so we can also easily compare the different in sizes of the haplotypes with the haploid. If some of those differences are actually due to deletions, that would deserve further analysis. If this analysis is not part of another ongoing project that will be published somewhere else, I suggest identifying with a dot-plot some of those differences, specially between haplotypes, and validate with long-reads crossing those regions whether some of the deletions are real or artifactual. Please, include the dotplot graph together with the two haplotypes in figure S10. In those cases that could be real, it would be very interesting what genes are gone, and if those are not placed somewhere else in the genome as result of translocations, or those genes are actually gone and could explain some of the differences reported in the gen count between haplotypes.

    The authors mentioned the presence of multiple structural variations, although some of which could be artifactual of miss-assemblies. Interestingly, the plot of the synteny blocks between the two haplotypes in figure S11 shows some of those structural variations, including cases of:

    • deletions: for instance, there are "blank" regions in Bs1A and Bs3A with no lines, which may reflect areas that are not present in the haplotype B.
    • duplications and translocations within chromosomes or between chromosomes of different haplotypes. Just looking to this plot, I wonder how the distribution of chromosomes between haplotypes is done. For instance, I see that Bs7B shares a duplicated synteny block with chromosomes Bs10B and Bs14B, but not with Bs10A and Bs10B, which means that the duplications are intra-haplotype present in B but not in A. But I wonder if it is possible that Bs10B and Bs14B could be in fact switched to haplotype A, and therefore there would be no duplication nor deletion in one of the haplotypes, just a simple translocation. I may be wrong in the interpretation, but I'm curious to understand the graph. In any case, again, as mentioned above, it would be worthy to validate some of those variations with long reads, which could illuminate the biological relevance between the haplotypes and discard potential artifactual errors of the assemblies.

    I notice that in figures 7 and S13, some lines are thicker than others. Is this because many "thin" lines are overlapped, and they look like a "thick" line. Otherwise, the visual effect of different thicknesses could be misleading. Please, clarify.

    In the analysis of the Hox cluster the authors say "[…] our new assembly revealed that B. schlosseri's Hox genes are not scattered. Instead, eight of them were clustered on the second largest scaffold (Bs2), whereas two other ones are found on the 15th largest scaffold (Bs15)." Generally, the description of the Hox gene in a cluster refers to the fact they are in the vicinity, with near not many other genes in between Hox genes. Therefore, I would not describe that eight Hox genes are clustered by the simple fact that they are in the same chromosome (maybe even in different arms).

  2. AbstractBotryllus schlosseri (Tunicata) is a colonial chordate that has long been studied for its multiple developmental pathways and regenerative abilities and its genetically determined allorecognition system based on a polymorphic locus that controls chimerism and cell parasitism. We present the first chromosome-level genome assembly from an isogenic colony of B. schlosseri clade A1 using a mix of long and short reads scaf-folded using Hi-C. This haploid assembly spans 533 Mb, of which 96% are found in 16 chromosome-scale scaffolds. With a BUSCO completeness of 91.2%, this complete and contiguous B. schlosseri genome assembly provides a valuable genomic resource for the scientific community and lays the foundation for future investigations into the molecular mechanisms underlying coloniality, regeneration, histocompatibility, and the immune system in tunicates.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf097), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 2: Tilman Schell

    Review of

    First chromosome-level genome assembly of the colonial chordate model Botryllus schlosseri (Tunicata)

    from

    Olivier De Thier, Marie Lebel, Mohammed M. Tawfeeq, Roland Faure, Philippe Dru, Simon Blanchoud, Alexandre Alié, Federico D. Brown, Jean-François Flot and Stefano Tiozzo

    Comments to the authors

    De Thier et al. present a high-quality chromosome scale de novo assembly of the tunicate Botryllus schlosseri from mainly PacBio HiFi and Arima Hi-C reads. Further WGS Illumina and ONT data was applied to resolve assembly errors or support the correctness of the assembly structure. Structural and functional annotations are conducted thoroughly. Downstream analyses include a synteny comparison of different Tunicata based on ancestral linkage groups and Hox genes.

    The manuscript is well written and methods are mostly described to ensure reproducibility. Despite the good shape of the manuscript, I would like to give some remarks, which should be addressed in a revised manuscript before publication.

    General remarks

    I like the quote in the beginning of the introduction.

    The authors conducted downstream analyses with different related tunicate genome assemblies on chromosome level. For assembly metrics, there is a comparison regarding BUSCO assessment only. I would point out the high quality of the B. schlosseri assembly in Table 2 and 4 by comparison with the other chromosome level and annotated tunicate genome assemblies as well.

    I am not an expert regarding tunicates, so please excuse my basic, curiosity driven question: In the results section "The laboratory model Sub-clade A1" you state that a part of COI is used as a barcode to differentiate ascidian species. In the introduction you state that wild colonies are able to fuse resulting in mixed genotypes. Since sample E* derived from the wild at some point, it might be theoretically possible to have not only mixed nuclear genotypes but mixed mitotypes too. Depending on how old sample E* is and how fast fixation of a mitotype can happen within a colony, this might be reflected in your data. Furthermore, this thought could be expanded to nuclear genotypes, which could hamper scientific findings.

    Contamination filtering was based on a sequence similarity search and taxonomic assignment of blobtools only. Despite blobtools/blobtoolkit was applied I was not able to find a blobplot in the supplemental files. I would like to encourage the authors to add blobplots before and after contamination filtering at least to the supplement. In my opinion, blobplots are most powerful when considering GC content and coverage in the first place - especially, when dealing with taxa, which are underrepresented in public databases. Therefore, using taxonomic assignment only for contamination filtering might generate false positives (e.g. conserved sequences across the tree of life with taxonomic assignment different than Chordata but with similar GC and coverage as the target) and false negatives (e.g. short sequences of the assembly, which couldn't be assigned with different GC and coverage as the target).

    In the paragraphs "Results and Discussion" (Haplotype-resolved assembly) as well as in "Methods" (Haploid genome assembly) you use the term "haploid assembly" multiple times. I find this term misleading, since the genome is not haploid and the assembly represents both haplotypes at the same time. I assume that primary contigs from hifiasm were used to generate this assembly. Therefore, I would suggest to e.g. call this assembly "based on primary contigs", "non phased", "haplotype mixed" or "haplotype unresolved" (as opposite to "haplotype resolved").

    Particular remarks

    Results and Discussion

    Sequencing and genome size estimation

    Table 1 Please specify what "round 1" and "round 2" are referring to. Was one library sequenced twice or were two different libraries created and sequenced?

    Haploid genome assembly

    "We identified 28 contigs that belong to spore-forming unicellular parasites of the microsporidia group [32]. This represents the first report of this fungal group in a tunicate species." Is this identification based on blobtools taxonomic assignment? This is not described in the methods. Furthermore, can you rule out that identification or taxonomic assignment is false positive? If not you should tune down the second sentence and maybe discuss this.

    "We then performed Hi-C scaffolding using YaHS [34], which reduced the number of contigs to 256, before [...]" Technically, scaffolding with yahs can only increase the number of contigs because original (hifiasm) contigs are split because of the Hi-C signal (at least as long the option --no-contig-ec isn't applied). I would substitute "contigs" with "sequences".

    "Finally, a manual curation was performed, resulting in an assembly made up of 16 major scaffolds [...]" Is there any previous study on the karyotype of B. schlosseri? If so, citing it here would strengthen your results. Otherwise, I would recommend to state the karyotypes or the number of chromosome scale scaffolds of other tunicates here and discuss, if your findings are in line.

    Table 2 Please substitute "No. of scaffolds" with "No. of sequences". Please add the contig N50 values. As pointed out above, I would like to see a comparison to the other chromosome level tunicate genome assemblies here, instead of showing basically the same stats twice.

    "[…] highlighted the presence of two large-scale genomic palindromes located within Bs1 and a smaller one in Bs3 (Figure 3)." The figure shows the presence but maybe you can highlight them in the figure and the caption even more?

    "To find out whether these palindromes may result from assembly artifacts [40], we checked the localization of the duplicated BUSCO genes along the chromosomes and did another run of CRAQ [...]" You could support your findings by showing an even coverage distribution within the palindromes, which is similar to the coverage distribution of whole assembly. Either as a histogram or a zoomed in version of the read coverage across reference as in the outer layer of the circos plot could show this nicely.

    Methods

    Sampling, DNA isolation, and sequencing

    "HiFi PacBio long reads" Please provide more details on how PacBio libraries (was it actually one library sequenced twice or two different libraries?) were created and sequenced. Were low or ultra-low protocols used? On which machine was sequencing conducted?

    RNA-seq data

    Is downloading public data a method? In any case you should cite the original papers and provide a list of accession numbers (supplement) but I would remove this paragraph and add the information to the paragraph "Genome annotation", e.g. "Public available RNA-seq reads [23, 25, 8] were aligned to the soft-masked assemblies [...]"

    Data preprocessing

    Depending on how the PacBio libraries were created and which PacBio machine was utilized for sequencing, you should state how HiFi calling was conducted (e.g. Sequel II) and how PCR adapter and duplicates were filtered out (e.g. ultra-low).

    Haploid genome assembly

    "To this aim, contigs were aligned to the NCBI nucleotide database (accessed 2023 March 18) using BLAST+ [78]" Please state the version of BLAST+.

    "Finally, a BLASTN search for fragments of the mitochondrial genome among the contigs was performed using the published complete mitochondrial genome of B. schlosseri (RefSeq NC_021463.1) [28]." Were the fragments filtered out based on the blast search? Please explain what was done in detail. Which hits were considered (e.g. cutoffs)? The mitochondrial genome of E* was assembled with NOVOPlasty, which is by the way not stated in the methods but in the results only. Was the assembled mt genome of E* added to the assembly, once the fragments were filtered out?

    Haplotype-resolved assembly

    If I understand correctly, the rapid curation pipeline was applied but no dual-curation was conducted. When aiming for haplotype-resolved assemblies, I would recommend to apply this method, e.g. concatenating both haplotypes and creating a combined contact map of haplotype 1 and 2, which can be curated as usual, with the advantage of being able to exchange (parts of) sequences between the haplotypes. In some cases phasing from hifiasm is not correct and can be easily corrected with this approach.

  3. AbstractBotryllus schlosseri (Tunicata) is a colonial chordate that has long been studied for its multiple developmental pathways and regenerative abilities and its genetically determined allorecognition system based on a polymorphic locus that controls chimerism and cell parasitism. We present the first chromosome-level genome assembly from an isogenic colony of B. schlosseri clade A1 using a mix of long and short reads scaf-folded using Hi-C. This haploid assembly spans 533 Mb, of which 96% are found in 16 chromosome-scale scaffolds. With a BUSCO completeness of 91.2%, this complete and contiguous B. schlosseri genome assembly provides a valuable genomic resource for the scientific community and lays the foundation for future investigations into the molecular mechanisms underlying coloniality, regeneration, histocompatibility, and the immune system in tunicates.

    This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf097), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    Reviewer 1: Jerome Hui

    In this manuscript, De Thier and colleagues reported the chromosomal level genome assembly of tunicate Botryllus schlosseri (Pallas, 1766) sub-clade A1. The methods used in this study are standard. B. schlosseri has been used as laboratory model in certain places to understand asexual development and regeneration for decades. Despite there was a draft quality genome published a decade ago (eLife 2013, 2:e00569), the authors here produced a high-quality phased genome based on modern technologies. In terms of genomic resources for this laboratory model, this is important and useful. The authors have also carried out analyses, including repeats, synteny, and Hox cluster genes. I also think some of these results are interesting. Below are my comments and suggestions for the authors to consider which hopefully can further improve the manuscript.

    1. Given the authors merged the results and discussion into one section, I would expect more discussion for several parts, including:
    • a. Repeats - For now, the analysis is quite standard and the main text is relatively descriptive. The question to me is what have we learnt from understanding the repeats from B. schlosseri genome? The authors should tell the readers.
    • b. Synteny analyses - This is an interesting finding. Extensive chromosomal rearrangement has also been discovered in other animals in recent. Can the authors further discuss these events?
    • c. Hox gene analyses - Again, it is quite descriptive. Tunicates are well known for dispersed Hox cluster for decades. So what have we learnt from the situation of B. schlosseri which I would be glad to see if the authors can discuss them.
    1. Figure S14
    • The authors should also show the bootstrap values on the key nodes.
    • In addition, the authors should also use one more method to construct the Hox gene tree in addition to Maximum Likelihood method.