Optimizing Parkinson’s Disease Prediction: A Comparative Analysis of Data Aggregation Methods Using Multiple Voice Recordings via an Automated Artificial Intelligence Pipeline

Zhengxiao Yang
Hao Zhou
Sudesh Srivastav
Jeffrey G. Shaffer
Kuukua E. Abraham
Samuel M. Naandam
Samuel Kakraba

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Patient-level grouped data are prevalent in public health and medical fields, and multiple instance learning (MIL) offers a framework to address the challenges associated with this type of data structure. This study compares four data aggregation methods designed to tackle the grouped structure in classification tasks: post-mean, post-max, post-min, and pre-mean aggregation. We developed a customized AI pipeline that incorporates twelve machine learning algorithms along with the four aggregation methods to detect Parkinson’s disease (PD) using multiple voice recordings from individuals available in the UCI Machine Learning Repository, which includes 756 voice recordings from 188 PD patients and 64 healthy individuals. Seven performance metrics—accuracy, precision, sensitivity, specificity, F1 score, AUC, and MCC—were utilized for model evaluation. Various techniques, such as Bag Over-Sampling (BOS), cross-validation, and grid search, were implemented to enhance classification performance. Among the four aggregation methods, post-mean aggregation combined with XGBoost achieved the highest accuracy (0.880), F1 score (0.922), and MCC (0.672). Furthermore, we identified potential trends in selecting aggregation methods that are suitable for imbalanced data, particularly based on their differences in sensitivity and specificity. These findings provide meaningful implications for the further exploration of grouped imbalanced data.

Version published to 10.3390/data10010004
Jan 2, 2025
Version published to 10.20944/preprints202412.0366.v1
Dec 5, 2024

An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques

This article has 2 authors:
1. Akshata K
2. Dharshini K
This article has no evaluationsLatest version Apr 7, 2026
A Machine Learning–Driven Health Risk Index for Predicting Chronic Disease Burden

This article has 1 author:
1. Ved Sharma
This article has no evaluationsLatest version Apr 2, 2026
Comparative Analysis of Deep Learning and Machine Learning Models for Early Prediction of Chronic Kidney Disease

This article has 3 authors:
1. Debabrata Maity
2. Subahsish Banerjee
3. Arnab Bandyopadhyay
This article has no evaluationsLatest version Apr 15, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An Intelligent AI-Driven Framework for Early Prediction of Heart Disease Using Advanced Machine Learning Techniques

A Machine Learning–Driven Health Risk Index for Predicting Chronic Disease Burden

Comparative Analysis of Deep Learning and Machine Learning Models for Early Prediction of Chronic Kidney Disease