High Data Quality Enhances Microplastic Toxicity Prediction

Ana Antonio Vital
Scott Coffin
Andrea Bonisoli-Alquati
Maaike Vercauteren
Luan de Souza Leite
Maximilian Pichler
Magdalena Mair

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Unlike chemicals, microplastics (MPs) lack standardized identifiers, limiting the applicability of traditional predictive ecotoxicology methods such as quantitative structure-activity relationship (QSAR) models. This study aimed to predict MP toxicity using MP properties, MP concentration, organismal traits, endpoints, and experimental design, and to evaluate how data pre-processing, dataset size, and quality influence model performance. We applied the Boosted Regression Tree (BRT) machine learning algorithm to four datasets derived from the Toxicity of Microplastics Explorer database (ToMEx 2.0): (i) imputed missing values, (ii) complete-case (missing values removed), (iii) high-quality data, and (iv) low-quality data. The high-quality dataset yielded the best final predictions for both random cross-validation (AUC = 0.93) and blocked cross-validation by particle identifier (AUC = 0.87). Explainable artificial intelligence (xAI) analyses showed that predictive performance was primarily determined by endpoints and concentration, with MP properties contributing despite limited reporting. Our findings demonstrate the feasibility of machine learning to predict and identify key drivers of MP toxicity, highlighting that high-quality data improves predictive performance while reducing data mining and computational costs. Standardized experiments, detailed MP characterization, and high reporting standards would better support risk assessment frameworks and inform the design of safer materials.

Version published to 10.32942/x2c96d
Mar 23, 2026

PhytoExtractQSAR: An Automated Pipeline for Literature-Mined Modeling of Phytochemical Extraction Outcomes with Transparent Generalization Assessment

This article has 1 author:
1. Sharhabil Eltahir
This article has no evaluationsLatest version Mar 9, 2026
Morphological Distribution of Aquatic Microplastics and Their Potential Implications for Liver Disease Pathogenesis

This article has 1 author:
1. Muhammad Adil Malik
This article has no evaluationsLatest version Apr 7, 2026
Artificial Neural Networks as a Decision-Support System for Predicting the Quality Attributes of Thermally Modified Wood

This article has 2 authors:
1. Özlem BOZKURT
2. Günay ÖZBAY
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

PhytoExtractQSAR: An Automated Pipeline for Literature-Mined Modeling of Phytochemical Extraction Outcomes with Transparent Generalization Assessment

Morphological Distribution of Aquatic Microplastics and Their Potential Implications for Liver Disease Pathogenesis

Artificial Neural Networks as a Decision-Support System for Predicting the Quality Attributes of Thermally Modified Wood