Estimating Home Prices Using Polynomial Regression with Interactions: Evidence from Orange County Housing Data (2000–2005)
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
It is always a challenge to accurately predict house prices, especially in areas where the population tends to fluctuate, even for seasoned real estate professionals, investors, and regulators. This research utilises 10,000 home sales from 2000 to 2005 in Orange County, Florida, to estimate residential property values by developing and comparing multiple linear regression models. The analysis evaluates predictive performance by incorporating structural attributes, geospatial coordinates, and interaction variables in the original and engineered methods. The primary metric of the research is 10-fold cross-validation (CV) with mean squared error (MSE) as the evaluation criterion. The best model in terms of accuracy, featuring home size, distance to central business districts, usage ratios, and the interaction effects, among the five tested models, is a cubic polynomial regression with key interaction terms, with an average test MSE of 2,228.56 and an R² of 0.849. The study highlights the importance of considering not only a single variable (e.g., house size, number of bedrooms, number of bathrooms) but also interaction variables (such as the interaction effect of house size and pool, usage ratios, and distance to pool) in determining the expected house price and models. While the research offers a trade-off between model complexity and computational feasibility, reliance on prior data and computational constraints limits the study's ability to predict non-linear models. The study's methodology provides a scalable framework for future modelling, which can be adapted to updated datasets and different regions.