MLwrap: Simplifying Machine Learning workflows in R
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
MLwrap is an R package designed to make the implementation of machine learning (ML) workflows accessible, efficient, and reproducible, particularly within the framework of the Knowledge Discovery in Databases (KDD) process. The package provides a unified and minimalistic interface that covers all essential stages of predictive modeling, including data preprocessing, model construction, hyperparameter optimization, and sensitivity analysis. MLwrap supports a range of widely used algorithms, such as Multilayer Perceptron Neural Networks, Support Vector Machines, Random Forests and XGBoost Decision Trees, and incorporates evidence-based default hyperparameter ranges to facilitate robust model selection. The workflow is organized into four core functions: preprocessing(), build_model(), fine_tuning(), and sensitivity_analysis(), which together streamline the entire ML pipeline and encapsulate all steps within a single, reproducible analysis object. Through two illustrative examples using the included sim_data dataset, this paper demonstrates the application of MLwrap to both regression and classification tasks, highlighting its capacity to simplify complex workflows and provide interpretable results. The package aims to empower analysts and researchers, especially in the health, social and behavioral sciences, to efficiently extract actionable insights from data while ensuring transparency and reproducibility in their analyses. MLwrap: “Start simple, scale smart”.