Gfa2bin enables graph-based GWAS by converting genome graphs to pan-genomic genotypes

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Variation graphs offer superior representation of genomic diversity compared to traditional linear reference genomes, capturing complex features that are otherwise inaccessible to analysis. It seems self-evident that integrating these graphs with genome-wide association studies (GWAS) should enable more comprehensive understanding of genetic landscapes, potentially uncovering novel associations between genetic variations and traits. This approach takes full advantage of rich genomic information, thereby providing deeper insights into the genetic base of complex traits. Our tool, gfa2bin, offers multiple methods to (i) genotype variation graphs and (ii) convert the genotypes to well-established data formats for genome-wide association studies (GWAS). We demonstrate that variation graphs are feasible alternatives to traditional linear references for GWAS. Our case study using Arabidopsis thaliana and 1,695 traits shows that our approach complements SNP-based approaches, often identifying additional associations, with all associations having on average higher significance compared to SNP-based approaches. gfa2bin is implemented in Rust. Commented source code is available under MIT license at https://github.com/MoinSebi/gfa2bin. Examples of how to run gfa2bin are provided in the documentation. We added several Python scripts and a Snakemake pipeline for easy processing of our tool using larger data sets. In addition, we recommend using packing (https://github.com/MoinSebi/packing) for reduced storage and preprocessing (normalization) of sequence-to-graph alignments coverage.

Article activity feed