Genotyping TOMM40’523 Poly-T Polymorphisms Using Whole-Genome Sequencing

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The TOMM40’523 poly-T repeat polymorphism (rs10524523), located in the TOMM40 gene and in linkage disequilibrium with APOE , has been associated with cognitive decline and Alzheimer’s disease (AD) progression. Accurate genotyping of this polymorphism is crucial for understanding its role in neurodegeneration. Challenges in processing whole-genome sequencing (WGS) data traditionally require additional PCR and targeted sequencing assays to genotype these polymorphisms. Here, we introduce a novel computational pipeline that integrates multiple short tandem repeat (STR) detection tools in an ensemble machine learning model using XGBoost . This approach leverages STR tool predictions, k-mer counts, and related features to enhance poly-T repeat length estimation. Using a sample of 1,202 participants from four cohort studies, we benchmarked our method against PCR-based measures. Our ensemble model outperformed individual STR tools, improving repeat length estimation accuracy (R 2 = 0.92) and achieving an accuracy rate of 93.2% with PCR-derived genotypes as the gold standard. Additionally, we validated our WGS-derived genotypes by replicating previously reported associations between TOMM40’523 variants and cognitive decline, demonstrating consistency with prior findings. Our results suggest that computational genotyping from WGS data is a scalable and reliable alternative to PCR-based assays, enabling broader investigations of TOMM40 variation in studies where WGS data is available.

Article activity feed