Inferring Sex, Ethnicity, and Age from RNA-seq Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA sequencing provides a comprehensive snapshot of gene expression, reflecting genetic inheritance and dynamic environmental influences. This study explores the predictive power of RNA-seq data combined with advanced machine learning techniques, such as Gradient Boosting Machines, Support Vector Regression, and SHapley Additive exPlanations, to infer complex human traits, including biological sex, age, and ethnicity, across diverse tissues. Using RNA-seq datasets derived from blood, heart, and several brain regions, we achieved near-perfect accuracy in sex determination, emphasizing the critical roles of sex chromosome-linked genes (XIST, KDM5D, EIF1AY). Age prediction demonstrated high tissue-specific precision, identifying transcripts indicative of biological aging, particularly those involved in DNA repair and inflammation, which offer promising biomarkers for aging-related diseases and research. Ethnicity prediction from RNA-seq effectively distinguished closely related populations (e.g., British vs. Utah residents of Northern European descent), surpassing SNP-based approaches by capturing rapid, environment-driven transcriptional adaptations in immune-related genes (IL2RA, FOXO4). Integrating RNA-seq with genomic data further enhanced prediction accuracy, revealing nuanced population-specific transcriptomic signatures shaped by genetic ancestry and environmental factors. Our findings underscore RNA-seq's significant potential for precision medicine, highlighting critical biomarkers and pathways that may guide personalized healthcare, anti-aging strategies, disease risk assessment, and targeted therapeutic interventions.

Article activity feed