Annotation and analysis of yellow genes in Diaphorina citri , vector for the Huanglongbing disease

This article has been Reviewed by the following groups

Huanglongbing (HLB), also known as citrus greening disease, is caused by the bacterium Candidatus Liberibacter asiaticus (CLas) and represents a serious threat to global citrus production. This bacteria is transmitted by the Asian citrus psyllid, Diaphorina citri (Hemiptera) and there are no effective in-planta treatments for CLas. Therefore, one strategy is to manage the psyllid population. Manual annotation of the D. citri genome can identify and characterize gene families that could serve as novel targets for psyllid control. The yellow gene family represents an excellent target as yellow genes are linked to development and immunity due to their roles in melanization. Combined analysis of the genome with RNA-seq datasets, sequence homology, and phylogenetic trees were used to identify and annotate nine yellow genes for the D. citri genome. Phylogenetic analysis shows a unique duplication of yellow-y in D. citri , with life stage specific expression for these two genes. Genomic analysis also indicated the loss of a gene vital to the process of melanization, yellow-f , and the gain of a gene which seems to be unique to hemipterans, yellow 9 . We suggest that yellow 9 or the gene yellow 8 ( c ), which consistently groups closely to yellow-f , may take on this role. Manual curation of genes in D. citri has provided an in-depth analysis of the yellow family among hemipteran insects and provides new targets for molecular control of this psyllid pest. Manual annotation was done as part of a collaborative community annotation project ( ).

  1. This article is a preprint and has not been certified by peer review [what does this mean?].

    This work has been peer reviewed in GigaByte (, which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

    **Reviewer 1. Feng Cheng. ** Crissy and co-authors annotated yellow genes in genome of Diaphorina citri, the vector of the Huanglongbing disease in citrus plants. The result is useful for close related area, and here I have some comments for the authors to improve the manuscript.

    1. The sections of introduction and background can be merged into one introduction section.

    2. Many sentences in the results section can be moved to the methods section.

    3. The methods section should be rewritten and re-organized as each analysis per paragraph.

    4. Some domain analysis and figures may be helpful for illustrating the evolution of important yellow genes in different insect species.

    **Reviewer 2. Mary Ann Tuli ** Is the language of sufficient quality?
    Yes. Line 18 'in-planta' should be in 'in planta' (in italics).

    Are all data available and do they match the descriptions in the paper?
    No. Additional Comments:

    1. line 224. "The MCOT protein sequences were used to search the *D. citri *genomes" We need the list of MCOT protein sequences that were used.

    2. Line 228. "A neighbor-joining phylogenetic tree of D. citri yellow protein sequences along with was created in MEGA version 7 using the MUSCLE multiple sequence alignment" a) Along with what? There are some words missing. b) We need the output of MUSCLE (FASTA). c) We need the files underlying the phylogenetic tree (newick).

    3. I note that MEGA7 has been used. I wonder why the newer release (MEGAX, March '21) was not used. Furthermore, the annotation protocol ( suggests using Mega7 or MegaX.

    4. Line 233. "Comparative expression levels of yellow proteins throughout different life stages (egg, nymph, and adult) in *Candidatus Liberibacter asiaticus *(Clas) exposed vs. healthy D. citri insects was determined using RNA-seq data and the Citrus Greening Expression Network (" Results are presented in Fig 3(a) and Fig 3(b) We need the raw data underlying these figures.

    Are the data and metadata consistent with relevant minimum information or reporting standards?
    Yes. Nomenclature standards have been met. All cited INSDC accession numbers are publicly available.

    Is the data acquisition clear, complete and methodologically sound?
    Yes. Curation workflow used for community annotation is available via , nonetheless the manuscript includes comprehensive summary which is appropriate.

    Is there sufficient detail in the methods and data-processing steps to allow reproduction?
    No See "Are all data available and do they match the descriptions in the paper?" above. Once the additional files are made available I believe reproduction will be possible.

    Is there sufficient data validation and statistical analyses of data quality? Yes

    Is the validation suitable for this type of data? Yes

    Is there sufficient information for others to reuse this dataset or integrate it with other data? Yes

    Any Additional Overall Comments to the Author:
    Citation [39] is not complete. It should be MCOT protein database.

    Some of my comments/recommendations are pertinent to the other *D. citri *manuscripts currently under review.