The broom of the system: a harmonized contextual data specification for One Health AMR pathogen genomic surveillance
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
One Health genomics initiatives often involve data streams originating from different sources, institutions, sectors, and information management systems. These are often heterogeneous datasets structured in a variety of ways, posing challenges for data harmonization, integration and meaningful interpretation. The Genomics Research and Development Initiative Shared Priority Projects for AMR (GRDI-AMR) uses a genomics-based approach to understand the prevalence and diversity of antimicrobial resistance determinants associated with food production and different environments that can impact human health, as well as how AMR can evolve, spread, and be mitigated. This work is being carried out by six different federal government departments and agencies, academic institutions, as well as agricultural and environmental networks. To facilitate harmonization of data, a modular, interoperable contextual data (metadata) specification was developed, called the GRDI-AMR One Health specification package. The package consists of an ontology-based data standard, built using semantic best practices and existing standards, and is operationalized in a data curation tool called the DataHarmonizer. This tool automates the transformation of contextual data into NCBI’s One Health Enterics BioSample format to support public data sharing. The package also includes different kinds of support materials such as field and term reference guides and a detailed curation protocol highlighting ethical, practical and privacy considerations. Tooling and vocabulary were iteratively improved through multiple rounds of real-world testing. The data standard is continually maintained and version controlled, and has been used to resolve a variety of data harmonization issues experienced throughout numerous collaborative surveillance projects. The standard also encourages the inclusion of prevalence metrics in order to make whole genome sequencing data more useful for risk assessment, and enables communication about data needs between data generators and users. While developed for Canadian surveillance, the GRDI-AMR specification has also been implemented in international harmonization efforts, demonstrating its utility for many types of One Health genomics projects. The specification package is available at (https://github.com/cidgoh/GRDI_AMR_One_Health).