Simulation and empirical evaluation of biologically-informed neural network performance

Gwen A. Miller
Ahmed Roman
Marc Glettig
Haitham A. Elmarakeby
Saud H. AlDubayan
Jihye Park
Ryan L. Collins
Eliezer M. Van Allen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Biologically-informed neural networks (BiNNs) offer interpretable deep learning models for biological data, but the dataset characteristics required for strong performance remain poorly understood. For instance, we previously developed P-NET, a BiNN with an architecture based on the Reactome pathway database, and applied this model to predict metastatic status of patients with prostate cancer using somatic mutation and copy number information. It seems likely that including additional relevant signal – e.g., germline variation in this context – should improve model performance, but we currently lack a principled approach to assess whether BiNNs will successfully detect this signal.

Here, we developed two simulation frameworks to evaluate the factors that influence BiNN performance – including signal type, signal strength, feature sparsity, and sample size – and empirically tested how integrating germline and somatic data affects the model’s ability to predict prostate cancer metastatic status. Simulations revealed that small sample size, weak signal strength, and especially extreme feature sparsity limit BiNN performance, and that the model preferentially uses linear over nonlinear signal. Empirically, P-NET performed poorly on sparse germline data, and while adding germline to somatic data did not improve prediction, it improved gene prioritization and model interpretation.

Broadly, our simulation frameworks enable systematic evaluation of how dataset-level characteristics affect BiNN performance and provide a principled framework for benchmarking novel methods.

Version published to 10.1101/2025.11.13.687845 on bioRxiv
Nov 14, 2025

What Drives GNN Performance in Tissue Dynamics? Insights from Vertex-Model Simulations

This article has 7 authors:
1. Matej Krajnc
2. Troy Comi
3. Siqi Miao
4. Adnan Hafeez
5. Hadar Serviansky
6. Pan Li
7. Tomer Stern
This article has no evaluationsLatest version Jan 20, 2026
Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization

This article has 3 authors:
1. Navaee Lavasani Monireh
2. Rezaeitabar Vahid
3. Khayamzadeh Maryam
This article has no evaluationsLatest version Dec 10, 2025
A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets

This article has 6 authors:
1. Yalu Wen
2. QINGYU MENG
3. Xiaoyan Sun
4. Ning Li
5. Long Liu
6. Deqiang Zheng
This article has no evaluationsLatest version Jan 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

What Drives GNN Performance in Tissue Dynamics? Insights from Vertex-Model Simulations

Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization

A Reproducible and Unified Benchmark of Deep Learning Feature Selection Across Simulations and Multi-Omics datasets