MDSLabChemBridge: Multi-Engine Molecular Descriptor Generation and ML-Ready Feature Engineering
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background : Molecular descriptors and fingerprints are the primary means of encoding chemical structures into numerical features that capture physicochemical, topological, geometric, and electronic properties of molecules, which is essential for a wide range of cheminformatics applications, including Quantitative Structural Activity Relationship (QSAR/SAR) analysis. They are widely used as machine-readable representations for statistical modeling and artificial intelligence workflows. Over the past few decades, several cheminformatics toolkits have been developed to calculate molecular descriptors and fingerprints, including RDKit, PaDEL-Descriptor, Mordred, and the Chemistry Development Kit (CDK). While these tools provide extensive descriptor libraries and computational capabilities, and are implemented in different programming environments such as Python and Java. As a result, integrating them into a single workflow often requires complex multi-language pipelines and additional scripting. Furthermore, most of these tools rely primarily on command-line interfaces and lack unified graphical environments, posing challenges for researchers without extensive programming expertise. Results : A freely available R package, MDSLabChemBridge, was developed with an integrated platform that bridges multiple descriptor engines within the R environment. The package enables seamless descriptor calculation from RDKit, Mordred, PaDEL, and CDK. In addition to existing descriptor engines, MDSLabChemBridge introduces a custom descriptor calculator, MDSLab Custom, that computes additional structural and functional group-based descriptors to complement traditional descriptor libraries. The package provides a unified interface and an interactive Shiny-based graphical user interface to simplify descriptor generation and cheminformatics analysis. Conclusion : MDSLabChemBridge provides a Shiny-based user interface with multiple data output options. The tool is designed to generate machine learning-ready descriptor matrices, enabling seamless integration with statistical modeling and AI workflows. The package is available at https://github.com/yogesh601/MDSLabChemBridge and can be installed directly in R.