Linguistic and Acoustic Biomarkers from Simulated Speech Reveal Early Cognitive Impairment Patterns in Alzheimer’s Disease

Akshita Debnath
Souhrid Sarkar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Alzheimer’s disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD.

Methods

Using Monte Carlo simulations, we emulated the Pitt DementiaBank “Cookie Theft” narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type–token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance.

Results

The model achieved high discriminative performance (AUC ≈ 0.94; accuracy ≈ 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type–token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles.

Interpretation

Our framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.

Version published to 10.64898/2026.04.08.717162 on bioRxiv
Apr 8, 2026

Normative Speech Modeling for ALS Diagnosis with Application to Other Neurodegenerative Diseases

This article has 1 author:
1. Mithil Shah
This article has no evaluationsLatest version May 27, 2026
Deep Learning and Machine Learning for Early Detection of Alzheimer’s Disease: A Systematic Review and Meta-Analysis

This article has 1 author:
1. Saketh Machiraju
This article has no evaluationsLatest version May 22, 2026
Predicting the timing of first sustained cognitive worsening in Alzheimer’s disease using real-world clinical data and machine learning

This article has 14 authors:
1. Shruthi Venkatesh
2. Sinian Zhang
3. Wen Zhu
4. Michele Morris
5. Rocco Mercurio
6. Sarah B Berman
7. Hansruedi Mathys
8. Abby L Olsen
9. C. Elizabeth Shaaban
10. Shyam Visweswaran
11. Oscar L Lopez
12. Tianxi Cai
13. Jue Hou
14. Zongqi Xia
This article has no evaluationsLatest version Jun 4, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Interpretation

Article activity feed

Related articles

Normative Speech Modeling for ALS Diagnosis with Application to Other Neurodegenerative Diseases

Deep Learning and Machine Learning for Early Detection of Alzheimer’s Disease: A Systematic Review and Meta-Analysis

Predicting the timing of first sustained cognitive worsening in Alzheimer’s disease using real-world clinical data and machine learning