Methods for Continuous-Valued Training Data Generation from Genome-Scale Metabolic Models: Partial-Inhibition FBA with Mixed Essentiality Sampling, Applied to ESKAPE Drug Target Curation

Byeongsoo Kang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background . Computational antimicrobial target discovery faces three methodological limitations: (i) knockout-only FBA yields binary phenotypes unsuitable for regression training, (ii) no experimentally labeled toxicity datasets exist at the gene-target level, and (iii) pipelines rarely report negative validation results. Methods . We describe a pipeline addressing each limitation. We introduce partial gene inhibition simulation (10-100% flux reduction) applied to mixed essential/non-essential gene sets (39 + 30 genes from 1,516 iML1515 genes), generating 945 continuous-valued FBA simulations as regression training targets (not independent drug response predictions). We describe two ANN architectures: a subsystem-structured ANN (61.5% parameter reduction over fully connected baselines) and a dual-head ANN for joint potency-toxicity regression. We propose an exploratory toxicity labeling heuristic (sequence homology 35%, pathway overlap 30%, conservation 20%, cross-reactivity 15%); weights are an initial proposal pending experimental calibration. These components are integrated with a Neo4j knowledge graph, local LLM literature mining (46% effective precision), and AlphaFold structural analysis. Results . Applied to three ESKAPE pathogen models (iML1515, iYS1720, iYL1228), the pipeline curates 29 targets lacking approved therapeutics from 39 literature-validated essential genes. Sequence-based audit reveals 11 of 21 targets lack detectable human homologs; folA shows 30% identity to human DHFR2, consistent with known trimethoprim cross-reactivity. Prospective-style temporal validation (2020 cutoff) shows the composite scoring heuristic did not exceed a random baseline (F1 = 0.519, z = -0.99), establishing the pipeline as a hypothesis generation tool rather than a predictive model. Double knockout of essential gene pairs produced indistinguishable lethal phenotypes, indicating partial inhibition grids are required for meaningful combination scoring. Conclusions . The methods -- partial inhibition FBA, two ANN architectures, multi-evidence toxicity labeling, and four-way integration -- are individually reusable. The complete pipeline (10-tab dashboard, 40 tests, all code) is released under MIT license at https://github.com/shoo99/ai-drug-target.

Version published to 10.21203/rs.3.rs-9374605/v1 on Research Square
Apr 13, 2026

Three Classes of Confound in Gene-Regulatory-Network Inference: A Systematic Audit and Open-Source Diagnostic Toolkit

This article has 1 author:
1. Ihor Kendiukhov
This article has no evaluationsLatest version Mar 26, 2026
Variance Decomposition Accesses a Clinically Supported Discovery Space Systematically Missed by Mean-Based Transcriptomic Prioritization

This article has 1 author:
1. XIAOQI HU
This article has no evaluationsLatest version Mar 30, 2026
Pathway-based machine learning for breast cancer risk stratification: an interpretable framework validated in two independent cohorts

This article has 2 authors:
1. Suhaan Thayyil
2. Eshaan Nidee
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Three Classes of Confound in Gene-Regulatory-Network Inference: A Systematic Audit and Open-Source Diagnostic Toolkit

Variance Decomposition Accesses a Clinically Supported Discovery Space Systematically Missed by Mean-Based Transcriptomic Prioritization

Pathway-based machine learning for breast cancer risk stratification: an interpretable framework validated in two independent cohorts