Spatio‐Temporal Forecasting of Divvy Bike‐Share Demand and Trip Duration Using Gradient‐Boosted Decision Trees
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study develops and validates a transparent, high-precision forecasting framework for predicting monthly trip volumes and average trip durations in Chicago’s Divvy bike-share system, thereby providing urban planners with reliable, data-driven insights without relying on complex deep-learning architectures.We assembled data for January–December 2024, including Divvy trip records, daily weather (temperature, precipitation, snowfall), and community-area covariates (median income, population, and transit-stop densities). We engineered nine predictors for each community area—including a weekend indicator and one- and seven-month lags of the response (trip count or duration)—to capture both demand inertia and seasonal fluctuations. A gradient-boosted decision-tree model (LightGBM) was trained in R, with hyperparameter tuning via grid search. Performance was evaluated using two complementary strategies: (1) a 10-fold ``leave-one-community-area-out'' spatial cross-validation to prevent spatial leakage and assess generalizability across distinct geographic contexts; and (2) a 10% stratified hold-out of community areas---sampled across low, medium, and high ridership (or duration) tiers---to balance bias--variance trade-offs and support early stopping.Hyperparameter tuning reduced cross-validation RMSE for trip counts from 3,314 to 2,341 rides/month (37.5 % of the mean). On the stratified hold-out, the final count model achieved an RMSE of 274 rides/month (8.4 % of hold-out mean). Applying the same pipeline to average trip duration yielded a hold-out RMSE of 0.36 minutes (2 % of mean duration). Feature-importance analysis revealed that the one-month lag explains \ 96% of predictive gain, with weather and spatial context each contributing \((<)\) 2%.A simple LightGBM framework—anchored by lagged demand and enriched with contextual covariates—delivers \((\leq 8%)\) error for trip counts and \((\leq 2%)\) for durations, offering a practical and interpretable forecasting tool for urban mobility planning without the need for deep-learning architectures.