# A Framework for Multi-stage Bonus Allocation in Meal Delivery Platforms: Optimizing Driver Incentives Under Budget Constraints
## Research Background
The explosive growth of online meal delivery has transformed urban food logistics, with platforms like Meituan processing over 30 million orders daily in China alone. Yet behind this convenience lies a persistent operational challenge: order cancellations caused by insufficient driver acceptance. When a crowdsourcing driver finds the delivery price unattractive, the order languishes unaccepted, eventually leading to cancellation—what the authors term “NA-canceled orders” (No-Accept canceled orders). On the Meituan platform, approximately 165,000 such cancellations occur every day, accounting for over 55% of negative customer reviews and costing billions of RMB annually in restaurant compensation for wasted food.
The traditional approach to this problem is straightforward: offer bonus payments to drivers to sweeten the deal. Business managers allocate a fixed monthly budget for these incentives, and a simple rule-based system distributes them—for example, adding three RMB after ten minutes of non-acceptance, six RMB after twenty minutes, and so on. While easy to implement, this one-size-fits-all approach is fundamentally inefficient. It treats every order identically at each decision point, ignoring the rich contextual information that determines whether a particular bonus will actually change a driver’s behavior. Some orders have inherently high acceptance probabilities and need no bonus at all; others sit in difficult delivery zones where even a generous bonus may not help. The question is not simply whether to offer a bonus, but how much, to which orders, and at what point in the order’s lifecycle—all while staying within a predetermined budget.
This paper, presented at KDD 2022 by researchers from Meituan and Huazhong University of Science and Technology, proposes a Multi-Stage Bonus Allocation (MSBA) framework that addresses exactly this challenge. Unlike prior work that treated bonus allocation as a single-stage decision, the authors recognize that an order’s lifecycle naturally divides into multiple stages—from initial placement through eventual acceptance, cancellation, or forced termination at the 50-minute mark. At each stage, the platform has an opportunity to reassess and adjust the bonus offered, creating a dynamic, multi-stage optimization problem that must be solved in real time under global budget constraints.
## Methodology Interpretation
The MSBA framework consists of three interconnected components: a semi-black-box acceptance probability model, a Lagrangian dual-based dynamic programming (LDDP) algorithm for offline optimization, and an online allocation algorithm for real-time decision-making. Together, these components bridge the gap between sophisticated offline planning and the millisecond-level response times required in production systems.
The Acceptance Probability Model. At the heart of the framework is a model that predicts how a given bonus amount will affect the probability that a driver accepts a specific order at a specific stage. The authors adopt a semi-black-box approach: they assume the acceptance probability follows a logistic function parameterized by two values, α and β, but these parameters are themselves estimated by a neural network that ingests rich contextual features. These features include the geographic locations of the customer and restaurant, elapsed time since order placement, estimated time of arrival (ETA), local supply-demand dynamics, and spatial information about nearby drivers. The logistic structure ensures a natural interpretation—higher bonuses always increase acceptance probability—while the neural network captures the complex, nonlinear relationships between context and driver behavior. Crucially, α and β are learned simultaneously but through different hidden layers, with bonus-bearing and bonus-free training samples updating different parts of the network. This design addresses a practical data imbalance: only a minority of orders historically received bonuses, so the model must learn baseline acceptance patterns (captured by β) from the majority of zero-bonus orders while learning bonus sensitivity (captured by α) from the smaller bonus-bearing subset.
A separate XGBoost model estimates the cancellation probability at each stage—the likelihood that the customer will cancel the order before it can be accepted. This is important because allocating a bonus to an order that the customer is about to cancel anyway wastes budget. The cancellation model uses order attributes such as distance, environmental conditions, and weather to generate its predictions.
Offline Optimization via LDDP. The core algorithmic contribution is the Lagrangian dual-based dynamic programming method. The overall MSBA problem is a non-linear, non-convex, multi-stage optimization problem: maximize the total number of accepted orders subject to a global budget constraint, where the bonus for each order can be adjusted at each of multiple allocation stages. Solving this directly is computationally intractable. The authors decompose it into two nested sub-problems. The outer problem uses dynamic programming to allocate the total budget across stages: how much of the monthly budget should be spent on stage 1 versus stage 2 versus later stages? The inner problem, at each stage, is a single-stage allocation: given a fixed budget for this stage, what is the optimal bonus for each individual order? The single-stage problem is itself non-convex, but through a change of variables—expressing the bonus as a function of acceptance probability rather than the reverse—it can be reformulated as a convex optimization problem. This allows the application of Lagrangian dual theory, where the budget constraint is relaxed into the objective via a Lagrangian multiplier λ. A bisection algorithm efficiently finds the optimal λ for each stage and budget level. The key output of the offline phase is a set of empirical Lagrangian multipliers, one per allocation stage, computed from historical data.
Sign in to read the full article
Sign in with your AI Passport account to access this content.
Sign InDon't have an account? Sign up free










