Machine Learning Based-Prediction of Health Application Effectiveness on Google Play Store
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives: This study aims to evaluate the effectiveness of health applications on the Google Play Store by analyzing app metadata using machine learning classification models. It investigates which application features—such as AI classification, app category, update status, and version—are associated with higher user ratings.Methods: A total of 305 health-related applications were selected from the Google Play Store using keyword filters for “Health & Fitness” and “Medical.” Key metadata were extracted and preprocessed, including Classification (AI vs. Non-AI), Category, Reviews, Developer Type, Version, Release Year, and Recent Update. To address class imbalance, the SMOTE technique was applied, and three machine learning models—Naïve Bayes, K-Nearest Neighbors (KNN), and Binomial Logistic Regression—were used to predict user ratings. Results: The KNN model achieved the most balanced performance with 75.89% accuracy, 82.22% precision, and an AUC of 0.849, while Logistic Regression produced the highest precision (100%) and overall accuracy (76.32%) but lower recall (52.63%). Logistic regression analysis also showed that apps categorized under Health & Fitness, those recently updated, and AI-based apps were more likely to receive high user ratings.Conclusion: Machine learning models, particularly KNN and Logistic Regression, can reliably predict app effectiveness based on metadata. Regular updates, AI integration, and fitness-focused design are key factors linked to higher user approval, providing useful insights for developers and digital health stakeholders. Future research should consider larger and more diverse datasets and explore additional features (e.g., user sentiment from reviews, app permissions) to further improve model performance.