Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background . Computational antimicrobial target discovery faces three methodological limitations: (i) knockout-only FBA yields binary phenotypes unsuitable for regression training, (ii) no experimentally labeled toxicity datasets exist at the gene-target level, and (iii) pipelines rarely report negative validation results. Methods . We describe a pipeline addressing each limitation. We introduce partial gene inhibition simulation (10-100% flux reduction) applied to mixed essential/non-essential gene sets (39 + 30 genes from 1,516 iML1515 genes), generating 945 continuous-valued FBA simulations as regression training targets (not independent drug response predictions). We describe two ANN architectures: a subsystem-structured ANN (61.5% parameter reduction over fully connected baselines) and a dual-head ANN for joint potency-toxicity regression. We propose an exploratory toxicity labeling heuristic (sequence homology 35%, pathway overlap 30%, conservation 20%, cross-reactivity 15%); weights are an initial proposal pending experimental calibration. These components are integrated with a Neo4j knowledge graph, local LLM literature mining (46% effective precision), and AlphaFold structural analysis. Results . Applied to three ESKAPE pathogen models (iML1515, iYS1720, iYL1228), the pipeline curates 29 targets lacking approved therapeutics from 39 literature-validated essential genes. Sequence-based audit reveals 11 of 21 targets lack detectable human homologs; folA shows 30% identity to human DHFR2, consistent with known trimethoprim cross-reactivity. Prospective-style temporal validation (2020 cutoff) shows the composite scoring heuristic did not exceed a random baseline (F1 = 0.519, z = -0.99), establishing the pipeline as a hypothesis generation tool rather than a predictive model. Double knockout of essential gene pairs produced indistinguishable lethal phenotypes, indicating partial inhibition grids are required for meaningful combination scoring. Conclusions . The methods -- partial inhibition FBA, two ANN architectures, multi-evidence toxicity labeling, and four-way integration -- are individually reusable. The complete pipeline (10-tab dashboard, 40 tests, all code) is released under MIT license at https://github.com/shoo99/ai-drug-target.

Article activity feed