HAMAP as SPARQL rules—A portable annotation pipeline for genomes and proteomes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
Genome and proteome annotation pipelines are generally custom built and not easily reusable by other groups. This leads to duplication of effort, increased costs, and suboptimal annotation quality. One way to address these issues is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.
Results
Here we demonstrate one approach to generate portable genome and proteome annotation pipelines that users can run without recourse to custom software. This proof of concept uses our own rule-based annotation pipeline HAMAP, which provides functional annotation for protein sequences to the same depth and quality as UniProtKB/Swiss-Prot, and the World Wide Web Consortium (W3C) standards Resource Description Framework (RDF) and SPARQL (a recursive acronym for the SPARQL Protocol and RDF Query Language). We translate complex HAMAP rules into the W3C standard SPARQL 1.1 syntax, and then apply them to protein sequences in RDF format using freely available SPARQL engines. This approach supports the generation of annotation that is identical to that generated by our own in-house pipeline, using standard, off-the-shelf solutions, and is applicable to any genome or proteome annotation pipeline.
Conclusions
HAMAP SPARQL rules are freely available for download from the HAMAP FTP site, ftp://ftp.expasy.org/databases/hamap/sparql/, under the CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license. A tutorial and supplementary code to use HAMAP as SPARQL are available on GitHub at https://github.com/sib-swiss/HAMAP-SPARQL, and general documentation about HAMAP can be found on the HAMAP website at https://hamap.expasy.org.
Article activity feed
-
Now published in GigaScience doi: 10.1093/gigascience/giaa003
Jerven Bolleman 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Jerven BollemanFor correspondence: jerven.bolleman@sib.swissEduoard de Castro 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Eduoard de CastroDelphine Baratin 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this …
Now published in GigaScience doi: 10.1093/gigascience/giaa003
Jerven Bolleman 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Jerven BollemanFor correspondence: jerven.bolleman@sib.swissEduoard de Castro 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Eduoard de CastroDelphine Baratin 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Delphine BaratinSebastien Gehant 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Sebastien GehantBeatrice A. Cuche 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Beatrice A. CucheAndrea H. Auchincloss 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Andrea H. AuchinclossElisabeth Coudert 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Elisabeth CoudertChantal Hulo 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Chantal HuloPatrick Masson 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Patrick MassonIvo Pedruzzi 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Ivo PedruzziCatherine Rivoire 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Catherine RivoireIoannis Xenarios 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, Switzerland2CHUV/LICR, Agora Centre, CH-1005 Lausanne, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Ioannis XenariosNicole Redaschi 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Nicole RedaschiAlan Bridge 1Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH-1211 Geneva 4, SwitzerlandFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Alan Bridge
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa003 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102090 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102091
-
-
-