Feature Engineering and Predictive Modeling for Housing Prices: A Case Study Using the Ames, Iowa Dataset
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study explores what drives housing prices in Ames, Iowa by looking at both the usual structural and spatial characteristics of homes and a set of new variables engineered from the original dataset.Three newly created variables including the percentage of finishing living area, the proportion of basement area to total living space, and years since last remodel are used to enhance interpretability and predictive power. In this paper, I investigate the effectiveness of engineered variables with traditional predictors over five supervised learning models including Linear Regression, Ridge Regression, Lasso Regression, Random Forest, and XGBoost. Model performance was assessed quantitatively using RMSE, R², correlation coefficients, and SHAP-based interpretability analyses. All results show that engineered features consistently improved predictive accuracy across all models as extra values but not in dominant effects. SHAP analysis further reveals that while traditional predictors remain highly influential, engineered features offer additional explanatory depth by capturing some obvious structural and temporal patterns.