MLwrap: Simplifying Machine Learning workflows in R

Rafael Jiménez
Javier Martínez-García
Juan José Montaño
Albert Sesé

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

MLwrap is an R package designed to make the implementation of machine learning (ML) workflows accessible, efficient, and reproducible, particularly within the framework of the Knowledge Discovery in Databases (KDD) process. The package provides a unified and minimalistic interface that covers all essential stages of predictive modeling, including data preprocessing, model construction, hyperparameter optimization, and sensitivity analysis. MLwrap supports a range of widely used algorithms, such as Multilayer Perceptron Neural Networks, Support Vector Machines, Random Forests and XGBoost Decision Trees, and incorporates evidence-based default hyperparameter ranges to facilitate robust model selection. The workflow is organized into four core functions: preprocessing(), build_model(), fine_tuning(), and sensitivity_analysis(), which together streamline the entire ML pipeline and encapsulate all steps within a single, reproducible analysis object. Through two illustrative examples using the included sim_data dataset, this paper demonstrates the application of MLwrap to both regression and classification tasks, highlighting its capacity to simplify complex workflows and provide interpretable results. The package aims to empower analysts and researchers, especially in the health, social and behavioral sciences, to efficiently extract actionable insights from data while ensuring transparency and reproducibility in their analyses. MLwrap: “Start simple, scale smart”.

Version published to 10.31234/osf.io/j6m4z_v1 on OSF Preprints
Nov 4, 2025

From Prompt to Pipeline: Large Language Models for Scientific Workflow Development in Bioinformatics

This article has 2 authors:
1. Khairul Alam
2. Banani Roy
This article has no evaluationsLatest version Oct 10, 2025
omicML: An Integrative Bioinformatics and Machine Learning Framework for Transcriptomic Biomarker Identification

This article has 9 authors:
1. Joy Prokash Debnath
2. Kabir Hossen
3. Md. Sayeam Khandaker
4. Shawon Majid
5. Md Mehrajul Islam
6. Siam Arefin
7. Preonath Chondrow Dev
8. Saifuddin Sarker
9. Tanvir Hossain
This article has no evaluationsLatest version Oct 27, 2025
Stacking Ensemble Learning : Combining XGBoost, LightGBM, CatBoost, and AdaBoost with Random Forest Meta Model

This article has 1 author:
1. Sindhu
This article has no evaluationsLatest version Oct 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

From Prompt to Pipeline: Large Language Models for Scientific Workflow Development in Bioinformatics

omicML: An Integrative Bioinformatics and Machine Learning Framework for Transcriptomic Biomarker Identification

Stacking Ensemble Learning : Combining XGBoost, LightGBM, CatBoost, and AdaBoost with Random Forest Meta Model