mafsmith: a Rust reimplementation of vcf2maf

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Mutation Annotation Format (MAF) is a standard interchange format for somatic variant data in tumor genomics. Converting variant call format (VCF) files to MAF requires functional annotation (through tools such as the Ensembl Variant Effect Predictor) and complex allele normalisation and field-mapping logic. The gold-standard implementation, vcf2maf, is written in Perl and could be made more computationally efficient by translating it to a newer language and adding support for parallel processing. Here we describe mafsmith, an implementation of vcf2maf in Rust. The mafsmith implementation of vcf2maf reimplements the allele-normalisation and field-mapping logic of vcf2maf and uses fastVEP for annotation, achieving field-for-field identical output across fifteen validated caller types and formats spanning germline, somatic, structural variant, and annotation-database VCFs. When both tools are run with the same Ensembl VEP annotation cache, mafsmith vcf2maf produces 0 conversion differences versus vcf2maf across 23 diverse datasets aligned to GRCh38 or GRCh37. The companion maf2vcf , vcf2vcf , and maf2maf subcommands were similarly validated against their reference Perl counterparts across six datasets. Benchmarked on multiple reference samples totalling 27.5 million variants, mafsmith achieves approximately 80-fold faster conversion of pre-annotated VCFs (range 74.3–84.1×), enabling faster and cheaper conversion of vcfs to mafs. mafsmith is open source under the same license as vcf2maf and available at https://github.com/nf-osi/mafsmith .

Article activity feed