A Multistage Binomial Model with Measurement Errors: Application to Protein Viability Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Many systems exhibit multistage mechanisms where failure of any single component leads to overall failure. Standard logistic regression, with its additive log-odds structure, is not well suited for such data-generating processes. We propose the Multistage Binomial (MSB) model, an extension of the binomial generalized linear model in which component effects combine multiplicatively on the success-probability scale. The MSB model naturally accounts for unobserved components by allowing the success probability to asymptote below one. It additionally incorporates measurement variability in explanatory variables through a Berkson-type error framework. We establish conditions for identifiability, develop a penalized maximum likelihood estimation procedure, and a non-standard likelihood ratio test for unit asymptote. Using synthetic data on mutated protein viability, we show that the likelihood ratio test is conservative, and only rejects the unit asymptote assumption when it is strongly supported by data. We also demonstrate that the MSB model provides more accurate inference and prediction than traditional logistic regression in multistage settings. Our simulations further show that larger sample sizes are required with increased proportion of unobserved components. We provide a chart for sample size determination under an MSB design for real data analysis given the desired accuracy and the distribution of predictors.

Article activity feed