Spatio‐Temporal Forecasting of Divvy Bike‐Share Demand and Trip Duration Using Gradient‐Boosted Decision Trees
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study develops and validates a transparent, high-precision forecasting framework for predicting monthly trip volumes and average trip durations in Chicago’s Divvy bike-share system, thereby providing urban planners with reliable, data-driven insights without relying on complex deep-learning architectures.We assembled data for January–December 2024, including Divvy trip records, daily weather (temperature, precipitation, snowfall), and community-area covariates (median income, population, and transit-stop densities). We engineered nine predictors for each community area—including a weekend indicator and one- and seven-month lags of the response (trip count or duration)—to capture both demand inertia and seasonal fluctuations. A gradient-boosted decision-tree model (LightGBM) was trained in R, with hyperparameter tuning via grid search. Performance was evaluated using two complementary strategies: (1) a 10-fold ``leave-one-community-area-out'' spatial cross-validation to prevent spatial leakage and assess generalizability across distinct geographic contexts; and (2) a 10% stratified hold-out of community areas---sampled across low, medium, and high ridership (or duration) tiers---to balance bias--variance trade-offs and support early stopping.Hyperparameter tuning reduced cross-validation RMSE for trip counts from 3,314 to 2,341 rides/month (37.5 % of the mean). On the stratified hold-out, the final count model achieved an RMSE of 274 rides/month (8.4 % of hold-out mean). Applying the same pipeline to average trip duration yielded a hold-out RMSE of 0.36 minutes (2 % of mean duration). Feature-importance analysis revealed that the one-month lag explains \ 96% of predictive gain, with weather and spatial context each contributing \((<)\) 2%.A simple LightGBM framework—anchored by lagged demand and enriched with contextual covariates—delivers \((\leq 8%)\) error for trip counts and \((\leq 2%)\) for durations, offering a practical and interpretable forecasting tool for urban mobility planning without the need for deep-learning architectures.