MDSLabChemBridge: Multi-Engine Molecular Descriptor Generation and ML-Ready Feature Engineering

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background : Molecular descriptors and fingerprints are the primary means of encoding chemical structures into numerical features that capture physicochemical, topological, geometric, and electronic properties of molecules, which is essential for a wide range of cheminformatics applications, including Quantitative Structural Activity Relationship (QSAR/SAR) analysis. They are widely used as machine-readable representations for statistical modeling and artificial intelligence workflows. Over the past few decades, several cheminformatics toolkits have been developed to calculate molecular descriptors and fingerprints, including RDKit, PaDEL-Descriptor, Mordred, and the Chemistry Development Kit (CDK). While these tools provide extensive descriptor libraries and computational capabilities, and are implemented in different programming environments such as Python and Java. As a result, integrating them into a single workflow often requires complex multi-language pipelines and additional scripting. Furthermore, most of these tools rely primarily on command-line interfaces and lack unified graphical environments, posing challenges for researchers without extensive programming expertise. Results : A freely available R package, MDSLabChemBridge, was developed with an integrated platform that bridges multiple descriptor engines within the R environment. The package enables seamless descriptor calculation from RDKit, Mordred, PaDEL, and CDK. In addition to existing descriptor engines, MDSLabChemBridge introduces a custom descriptor calculator, MDSLab Custom, that computes additional structural and functional group-based descriptors to complement traditional descriptor libraries. The package provides a unified interface and an interactive Shiny-based graphical user interface to simplify descriptor generation and cheminformatics analysis. Conclusion : MDSLabChemBridge provides a Shiny-based user interface with multiple data output options. The tool is designed to generate machine learning-ready descriptor matrices, enabling seamless integration with statistical modeling and AI workflows. The package is available at https://github.com/yogesh601/MDSLabChemBridge and can be installed directly in R.

Article activity feed