Automated GenePy Gene-Burden Computation via a Reproducible Nextflow Workflow Integrated with the Genomics England (GEL) Lifebit Platform

Iman Nazari
Guo Cheng
James Ashton
Sarah Ennis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Interpretation of rare-disease genomes remains constrained by variant-centric analytical frameworks that insufficiently capture the cumulative impact of multiple variants within a gene. GenePy provides an individual-level, gene-based burden metric that integrates variant consequence, allele frequency, and zygosity into a unified quantitative score, enabling a transition from discrete variant annotation to aggregated gene-level interpretation. In the context of Genomics England, this formulation supports a panel-agnostic, genotype-to-phenotype diagnostic strategy for unresolved monogenic disorders by prioritising genes with elevated mutational burden per individual.

Here, we present a fully automated, containerised GenePy workflow deployed through Nextflow and integrated within the Genomics England (GEL) Research Environment via the Lifebit CloudOS platform. This implementation provides scalable, secure, and governance-compliant computation of gene-level burden scores across population-scale cohorts. The workflow harmonises variant annotation, quality control, and chunked data aggregation within modular, reproducible processes designed for high-throughput execution on cloud-native infrastructure. By enabling robust, portable, and auditable gene-level scoring across large rare-disease sequencing datasets, this framework enhances analytical resolution and supports downstream statistical prioritisation, integrative phenotype matching, and hypothesis generation within genotype-to-phenotype diagnostic workflows.

Version published to 10.64898/2026.05.22.26353863 on medRxiv
May 24, 2026

Scalable and rare-variant aware genome inference across the 1kGP cohort

This article has 8 authors:
1. Jana Ebler
2. Timofey Prodanov
3. Andrew Blair
4. Samuel K. Lee
5. Peter Ebert
6. Human Pangenome Reference Consortium
7. Benedict Paten
8. Tobias Marschall
This article has no evaluationsLatest version Jul 3, 2026
Predicted Effector Gene Aggregation, Standards and Unified Schema (PEGASUS): A Community Framework for Effector Gene Reporting

This article has 16 authors:
1. Aoife McMahon
2. Yue Ji
3. Maria Costanzo
4. Adam S Butterworth
5. Matt Pahl
6. Szymon Szyszkowski
7. Karl Heilbron
8. Abdurrahman Shiyanbola
9. Yakov A. Tsepilov
10. Cassandra N Spracklen
11. Drew Hite
12. Alex Shilin
13. PEG Working Group
14. Helen Elizabeth Parkinson
15. Noel P Burtt
16. Laura Wiseman Harris
This article has no evaluationsLatest version Jun 17, 2026
Genomic Annotation Infrastructure (GAIn): Pipelines and Resource Repositories for Annotating Variants, Positions, and Regions

This article has 7 authors:
1. Murat Cokol
2. Lubomir Chorbadjiev
3. Yoon-ha Lee
4. Minal Jamsandekar
5. Ilina Gergova
6. Ivo Todorov
7. Ivan Iossifov
This article has no evaluationsLatest version Jul 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Scalable and rare-variant aware genome inference across the 1kGP cohort

Predicted Effector Gene Aggregation, Standards and Unified Schema (PEGASUS): A Community Framework for Effector Gene Reporting

Genomic Annotation Infrastructure (GAIn): Pipelines and Resource Repositories for Annotating Variants, Positions, and Regions