A Dual Learning Model for Last-Mile Delivery Pricing

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Last-mile delivery platforms encounter signicant order cancellations due to uncertainty about the customer's willingness to pay adequate delivery fee and also the driver's willingness to accept such offers. The current approach of using static pricing models fail to adapt to this dual uncertainty, leading to suboptimal market outcomes. This paper develops a framework to address this challenge by modeling this as a learning problem. We formulate the interaction as a stackelberg game where the platform acts as the leader, making direct price and payout offers. We propose and compare two learning methodologies for the platform: a model-based approach that simultaneously learns the parameters of customer and driver valuation functions using a gradient-based method, and a modelfree Q-learning approach that directly learns an optimal pricing policy. Numerical simulations demonstrate that both learning approaches signicantly outperform a non-learning, fixed-rate benchmark policy. Both the model-based and model-free approaches are shown to converge towards the true underlying values over time, leading to more efficient transaction outcomes and a larger total economic surplus for the platform, customers, and drivers. The results highlight the substantial value of adaptive learning in such a two-sided market. By actively managing this exploration-exploitation trade-off, a platform can uncover latent market dynamics, automate its pricing strategy, and achieve superior long-term profitability. This provides a clear advantage over static pricing mechanisms.

Article activity feed