PDBCleanV2: A Python Library for Generating Consistent Structure Datasets
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The number of structures in the Protein Data Bank has grown rapidly in recent years. Nonetheless, comparing structures of a specific system often proves challenging due to variant nomenclature, origins from different species, presence of various ligands, and missing atoms or chains. To address these issues, we have developed PDBCleanV2, a Python package that enables users to create consistent datasets of structures, simplifying comparison among structures. This library generates individual files for each molecule in a structure file, corrects labeling errors, and standardizes chain names and numbering. Our aim is to provide researchers with consistent datasets that streamline their analysis. The source code, installation instructions, and tutorials can be found in PDBCleanV2’s GitHub repository and Zenodo accessible at https://github.com/fatipardo/PDBClean-0.0.2/ and https://doi.org/10.5281/zenodo.14014241 , respectively.