BEDMS: A metadata standardizer for genomic region attributes
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-throughput sequencing technologies have generated vast omics data annotating genomic regions. A challenge arises in integrating this data because the associated metadata does not follow a uniform schema. This hinders data management, discovery, interoperability, and reusability. Existing tools that address metadata standardization issues are generally limited in scope and targeted toward specific data sets or types and are not generally applicable to custom schemas. To improve standardization of genomic interval metadata, we have developed BEDMS. We developed and evaluated several model architectures and trained models that achieved high performance on held-out training data. With a trained model, BEDMS provides users with predicted standardized metadata attributes that follow a standardized schema. Furthermore, BEDMS provides the ability to train custom models. To demonstrate, we trained BEDMS on three different schemas, allowing users to choose which schema to standardize into. We also deployed BEDMS on PEPhub, which provides a graphical user interface to allow users to standardize metadata without requiring any local training or software at all. In conclusion, BEDMS offers a practical one-stop solution for metadata management and standardization for genomic interval data.