PacBio HiFi sequence data reveal a chromosomal misassembly in the Candida parapsilosis CDC 317 reference sequence and provide the foundation for a phased diploid genome assembly
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The incidence of systemic candidiasis caused by the diploid ascomycetous yeast Candida parapsilosis is increasing. Research is aimed at understanding how the yeast interacts with its human host, as well as mechanisms of antifungal drug resistance. This work describes use of highly accurate, long-read DNA sequencing technology (Pacific Biosciences HiFi) to construct collapsed haploid and phased diploid genome assemblies for C. parapsilosis strain CDC 317, the strain used for the current GenBank reference sequence. These long-read genome assemblies revealed a translocation artifact in the reference sequence. The phased diploid assembly confirmed well-known homogeneity between homologous chromosomes but revealed previously undescribed structural variation between copies of chromosome 8. Much of the sequence variation between chromosome homologs was found in genes that encode large, cell-surface glycoproteins and contain repeated sequence motifs. Data presented here extends knowledge of the C. parapsilosis genome. The HiFi reads data set and the new genome assemblies will facilitate research aimed at preventing and controlling C. parapsilosis disease.