EnSCAN: ENsemble Scoring for prioritizing CAusative variaNts across multi-platform GWAS for Late-Onset Alzheimer's Disease

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Late-Onset Alzheimer Disease (LOAD) is a progressive and complex neurodegenerative disorder in the aging population. LOAD is characterized by cognitive decline, such as deterioration of memory, loss of intellectual abilities, and other cognitive domains depending on traumatic brain injuries. Alzheimer's Disease (AD) presents a complex genetic landscape that remains elusive, which restrains the early and differential diagnosis of LOAD. While Genome-Wide Association Studies (GWAS) enable the examination of statistical interactions among individual variants within specific loci, traditional univariate analysis may overlook intricate relationships between these genetic elements. Machine learning (ML) algorithms, on the other hand, prove invaluable in unraveling concealed, novel, and significant patterns by considering nonlinear interactions among variants. This approach enhances our comprehension of the genetic predisposition underlying complex genetic disorders. When working on different platforms, majority voting can not be applied because the attributes differ. Hence, a new post-ML ensemble approach is developed to select significant SNVs over multi-genotyping platforms. We proposed the EnSCAN framework using a new algorithm to ensemble selected variants even from different platforms to prioritize candidate causative loci, which consequently helps improve ML results by combining the prior information captured from each multi-model of each dataset. The proposed ensemble algorithm utilizes chromosomal locations of SNVs by mapping to cytogenetic bands, along with the proximities between pairs and multi-model via Random Forest validations to prioritize SNVs and candidate causative genes for Alzheimer Disease. The scoring method is scalable and can be applied to any multi-platform genotyping study. We present how the proposed EnSCAN scoring algorithm prioritizes the candidate causative variants related to LOAD among three GWAS datasets.

Article activity feed