Realigning Clinical Studies from HG38 to T2T-CHM13 to Discover Differences Between Using Different Reference Genomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In genomics research, reference genomes are used by researchers to compare sequencing data. Human reference genomes represent the complete set of DNA within a cell. The first human reference genome was released in 2003, though incomplete due to technological barriers. In December of 2013, the HG38 genome, an updated version of previous genomes, was released, and it currently remains the most used reference genome due to its abundant annotations. Later in 2022, T2T-CHM13 was released, representing the first complete Human reference genome assembly, but most groups still use the HG38 genome. I hypothesize that using the T2T-CHM13 human reference genome for alignments will display more mapped reads, as the T2T-CHM13 is complete and, therefore, contains sections of the genome that older reference genomes, such as the HG38 reference genome, lack. In this study I realigned publicly available Renal Carcinoma cell line RNA-sequencing data, from the Cancer Cell Line Encyclopedia, to the T2T-CHM13 genome and found a higher percentage of primary reads aligned to T2T-CHM13 than to HG38 across all chromosomes. MAPQ scores further confirmed that these reads were uniquely aligned. Additionally, I performed a T-test to confirm that the HG38 and T2T-CHM13 genomes are significantly different. These findings suggest that the T2T-CHM13 genome provides more comprehensive alignment and may yield more accurate insights. Therefore, future studies should adopt the T2T-CHM13 genome as the standard, and efforts should be made to improve its annotation to increase accessibility. Additionally, high impact studies should be realigned to the T2T-CHM13 genome to determine if any new conclusions can be found.

Article activity feed