A Framework for Multi-stage Bonus Allocation in Meal Delivery Platforms: Optimizing Driver Incentives Under Budget Constraints

# A Framework for Multi-stage Bonus Allocation in Meal Delivery Platforms: Optimizing Driver Incentives Under Budget Constraints

## Research Background

The explosive growth of online meal delivery has transformed urban food logistics, with platforms like Meituan processing over 30 million orders daily in China alone. Yet behind this convenience lies a persistent operational challenge: order cancellations caused by insufficient driver acceptance. When a crowdsourcing driver finds the delivery price unattractive, the order languishes unaccepted, eventually leading to cancellation—what the authors term “NA-canceled orders” (No-Accept canceled orders). On the Meituan platform, approximately 165,000 such cancellations occur every day, accounting for over 55% of negative customer reviews and costing billions of RMB annually in restaurant compensation for wasted food.

The traditional approach to this problem is straightforward: offer bonus payments to drivers to sweeten the deal. Business managers allocate a fixed monthly budget for these incentives, and a simple rule-based system distributes them—for example, adding three RMB after ten minutes of non-acceptance, six RMB after twenty minutes, and so on. While easy to implement, this one-size-fits-all approach is fundamentally inefficient. It treats every order identically at each decision point, ignoring the rich contextual information that determines whether a particular bonus will actually change a driver’s behavior. Some orders have inherently high acceptance probabilities and need no bonus at all; others sit in difficult delivery zones where even a generous bonus may not help. The question is not simply whether to offer a bonus, but how much, to which orders, and at what point in the order’s lifecycle—all while staying within a predetermined budget.

This paper, presented at KDD 2022 by researchers from Meituan and Huazhong University of Science and Technology, proposes a Multi-Stage Bonus Allocation (MSBA) framework that addresses exactly this challenge. Unlike prior work that treated bonus allocation as a single-stage decision, the authors recognize that an order’s lifecycle naturally divides into multiple stages—from initial placement through eventual acceptance, cancellation, or forced termination at the 50-minute mark. At each stage, the platform has an opportunity to reassess and adjust the bonus offered, creating a dynamic, multi-stage optimization problem that must be solved in real time under global budget constraints.

## Methodology Interpretation

The MSBA framework consists of three interconnected components: a semi-black-box acceptance probability model, a Lagrangian dual-based dynamic programming (LDDP) algorithm for offline optimization, and an online allocation algorithm for real-time decision-making. Together, these components bridge the gap between sophisticated offline planning and the millisecond-level response times required in production systems.

The Acceptance Probability Model. At the heart of the framework is a model that predicts how a given bonus amount will affect the probability that a driver accepts a specific order at a specific stage. The authors adopt a semi-black-box approach: they assume the acceptance probability follows a logistic function parameterized by two values, α and β, but these parameters are themselves estimated by a neural network that ingests rich contextual features. These features include the geographic locations of the customer and restaurant, elapsed time since order placement, estimated time of arrival (ETA), local supply-demand dynamics, and spatial information about nearby drivers. The logistic structure ensures a natural interpretation—higher bonuses always increase acceptance probability—while the neural network captures the complex, nonlinear relationships between context and driver behavior. Crucially, α and β are learned simultaneously but through different hidden layers, with bonus-bearing and bonus-free training samples updating different parts of the network. This design addresses a practical data imbalance: only a minority of orders historically received bonuses, so the model must learn baseline acceptance patterns (captured by β) from the majority of zero-bonus orders while learning bonus sensitivity (captured by α) from the smaller bonus-bearing subset.

A separate XGBoost model estimates the cancellation probability at each stage—the likelihood that the customer will cancel the order before it can be accepted. This is important because allocating a bonus to an order that the customer is about to cancel anyway wastes budget. The cancellation model uses order attributes such as distance, environmental conditions, and weather to generate its predictions.

Offline Optimization via LDDP. The core algorithmic contribution is the Lagrangian dual-based dynamic programming method. The overall MSBA problem is a non-linear, non-convex, multi-stage optimization problem: maximize the total number of accepted orders subject to a global budget constraint, where the bonus for each order can be adjusted at each of multiple allocation stages. Solving this directly is computationally intractable. The authors decompose it into two nested sub-problems. The outer problem uses dynamic programming to allocate the total budget across stages: how much of the monthly budget should be spent on stage 1 versus stage 2 versus later stages? The inner problem, at each stage, is a single-stage allocation: given a fixed budget for this stage, what is the optimal bonus for each individual order? The single-stage problem is itself non-convex, but through a change of variables—expressing the bonus as a function of acceptance probability rather than the reverse—it can be reformulated as a convex optimization problem. This allows the application of Lagrangian dual theory, where the budget constraint is relaxed into the objective via a Lagrangian multiplier λ. A bisection algorithm efficiently finds the optimal λ for each stage and budget level. The key output of the offline phase is a set of empirical Lagrangian multipliers, one per allocation stage, computed from historical data.

🔒

Sign in to read the full article

Don't have an account? Sign up free

Explore

Logistics

Regions

A Framework for Multi-stage Bonus Allocation in Meal Delivery Platforms: Optimizing Driver Incentives Under Budget Constraints

Related Posts

How Meituan Uses Gaussian Mixture Models to Optimize Food Delivery: New Research from Tsinghua University

Drone-Courier Collaborative Delivery: HKUST Study Reveals Optimal Infrastructure Planning and Order Assignment Strategies

Georgia Tech × Grubhub’s Landmark 70-Page Paper: Defining the Meal Delivery Routing Problem (MDRP)

KDD Paper in Production: Tsinghua × Meituan Learn Order Pooling from Skilled Couriers, 55% Peak Efficiency Gain

Transportation Science Paper: RL + Hyper-Heuristics Cut Meituan Meal Delivery Costs by 12%

Harvard Business School Classic: How Toyota’s Lean Principles Reshape Organizational Learning Beyond Manufacturing

Leave a Reply Cancel reply

Recommended

Manifest 2026: DHL Survey of 350 North American Leaders Reveals AI, Nearshoring and Robotics as Supply Chain’s New Imperatives

Nobel Lecture Decoded: How Williamson’s Transaction Cost Economics Defines Supply Chain Make-or-Buy Decisions

Uber Freight Announces New Technology Products Managing $20 Billion in Freight

Motive Launches Service to Speed Up Truck Driver Assistance

SCI.AI

Categories

Welcome Back!

Create New Account!

Retrieve your password

Add New Playlist