FASA: Feature-Agnostic Stacked Autoencoders for Accurate Adverse Drug Reaction Prediction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose: Adverse drug reactions (ADRs) remain a major obstacle to drug safety, yet many computational predictors depend on molecular or biological features that are often unavailable for newly designed compounds. Several published models also report inflated performance due to biased preprocessing or unsuitable evaluation metrics. This work introduces a feature-free deep learning framework that predicts ADRs using only drug–ADR incidence matrices, which allows early-stage assessment even when auxiliary features are missing. Methods: FASA (Feature-Agnostic Stacked Autoencoders) was developed and trained solely on binary drug-ADR incidence matrices. FASA includes a cardinality-preserving regularization term that constrains reconstructed ADR vectors to follow realistic label-count distributions, preventing degenerate solutions and encouraging the model to learn meaningful structure from sparse data. Performance was evaluated via cross-validation, and the area under the precision-recall curve was reported, as it is well suited to extremely sparse pharmacovigilance data. Results: On the harmonized WPLMF dataset (1,177 drugs and 4,247 ADRs), the method achieves an AUPR of 0.7150, surpassing all baseline models reported in the original study, including MCS-MKL, FGRMF, IDSE-HE, Galeano, LogitMF and WPLMF, which obtains 0.6553 under the same five-fold protocol. On the raw SIDER benchmark, FASA reaches an AUPR of 0.6456, again outperforming previously published results on the unmodified matrix. Conclusion: These findings show that carefully regularized deep architectures can recover meaningful pharmacological structure directly from sparse incidence data. FASA offers a straightforward and competitive approach for large-scale ADR prediction using only drug-ADR incidence matrices, without requiring chemical, biological, or phenotypic features, and generalizes across datasets with varying levels of curation.

Article activity feed