GeneFíor: A back to basics and transparent multi-tool approach to sequence detection
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The detection of sequences of interest, such as antimicrobial resistance genes, directly from genomic and metagenomic sequencing data has become routine, enabled by curated reference databases and rapid in silico sequence search tools. Yet most workflows depend on prior assembly, an inherently lossy process in which a substantial proportion of reads fail to assemble or are collapsed into consensus sequences, causing low-abundance variants and nucleotide-level diversity to be systematically obscured. The tools used to interrogate the resulting assemblies compound this further, clustering reference sequences at arbitrary identity thresholds, imposing hidden parameter defaults, and reducing intermediate alignment evidence to summarised outputs that cannot be critically evaluated or reproduced.
Here we present GeneFíor, a transparent, multitool workflow integrating BLAST, DIAMOND, Bowtie2, BWA, and Minimap2 to search both DNA and protein sequences against any user-supplied reference database. By enforcing genecentric identity and coverage thresholds at both the read and gene level, GeneFíor reduces false positives while retaining sensitivity to genuine, low-abundance variants, including those differing at single-nucleotide resolution. Crucially, by exposing all alignment parameters, preserving intermediate outputs, and generating cross-tool consensus detection matrices, GeneFíor makes the influence of tool choice, database selection, and parameter configuration on reported gene profiles directly observable and reproducible.