Supervised Learning for Predicting Unknown Modifying Variables in Pliable Lasso: Applications to High-Dimensional Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate outcome prediction often requires modeling complex interactions between input features and context-specific modifiers. The pliable lasso is a flexible regression framework that integrates such modifiers into the prediction process. In many real- world applications, however, these modifiers are unobserved at test time and must be estimated. This study investigates the performance of eight supervised machine learning algorithms for estimating the modifier matrix Z in a pliable lasso model under a known-to-unknown scenario. The analysis considers both classification accuracy for modifier estimation and regression accuracy for the final response prediction, using simulated data and two relevant real-world datasets: the Superconductivity dataset and the Mice Protein Expression dataset. Results indicate that tree-based ensemble models (e.g., XGBoost, Random Forest, Decision Tree) deliver superior modifier classification (AUC > 0.99), while regularized models such as Lasso and Elastic Net achieve the best regression performance. The findings support a hybrid modeling approach in which tree-based classifiers estimate modifying variables, followed by regularized regression for accurate and interpretable predictions. This strategy holds promise for data-driven modeling in high-dimensional engineering systems where partial contextual information is available.