fastVEP: A Fast, Comprehensive Variant Effect Predictor Written in Rust
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The annotation of genomic variants with their predicted functional consequences is a critical step in genomics research and clinical diagnostics. The widely used Ensembl Variant Effect Predictor (VEP), implemented in Perl, faces performance limitations when processing the increasingly large variant call sets generated by modern sequencing studies. Here we present fastVEP , a complete reimplementation of the VEP variant annotation engine in Rust. fastVEP annotates the complete GIAB HG002 clinical whole-genome sequencing benchmark (4.05 million high-confidence variants) against the full Ensembl GRCh38 gene model (508,530 transcripts) in 86 seconds. Multi-organism benchmarks on complete gold-standard datasets, including 26 million Mouse Genomes Project variants and 12.9 million Arabidopsis 1001 Genomes variants, demonstrate sustained throughput of 47,000∼86,000 variants per second. In head-to-head benchmarks, fastVEP achieves up to 130x speedup over Ensembl VEP v115.1. Annotation accuracy was validated against Ensembl VEP release 115.1, achieving 100% concordance across 23 annotation fields on 2,340 shared transcript-allele pairs.
Beyond core consequence prediction, fastVEP provides a comprehensive supplementary annotation framework (fastSA) with native binary format for direct integration with ClinVar, gnomAD, dbSNP, COSMIC, 1000 Genomes, TOPMed, and MitoMap databases; prediction and conservation scores including PhyloP, GERP, REVEL, SpliceAI, PrimateAI, and SIFT/PolyPhen via dbNSFP; structural variant annotation (DEL, DUP, INV, CNV, BND) with SV-specific consequence prediction; gene-level annotations from OMIM and gnomAD gene constraint metrics; a filter_vep-compatible expression-based filter engine; multi-sample genotype parsing; regulatory region detection; and mitochondrial-specific variant handling.
fastVEP supports both GRCh38 and GRCh37 genome builds, ships as a single 3.3 MB statically-linked binary with zero external dependencies, predicts 49 Sequence Ontology consequence terms, outputs 48 VEP-compatible CSQ annotation fields, supports VCF, tab-delimited, and JSON output formats, generates HGVS nomenclature, and includes a built-in web interface for interactive variant annotation. fastVEP is open source under the Apache 2.0 license and is available at https://github.com/Huang-lab/fastVEP . A hosted web server is available at https://fastVEP.org .