A Multi-stage Bonus Allocation Framework for Meal Delivery Platforms
This research proposes a multi-stage bonus allocation framework to address order acceptance challenges in meal delivery platforms. Through dynamic bonus incentives, the framework can reduce canceled orders by over 60% within a limited budget while saving restaurants more than 30% in food waste compensation.
Research Background
The online meal delivery industry is experiencing explosive growth, becoming an essential service in our daily lives. Meituan, China’s most popular meal delivery platform, for instance, handles 30 million meal orders every day. The platform’s core objective is to provide excellent and stable services to both restaurants and customers. However, in reality, hundreds of thousands of orders on the Meituan platform are canceled daily because they are not accepted by crowdsourcing drivers. Such order cancellations are severely detrimental to customer repurchase rates and the overall reputation of the Meituan meal delivery platform. To address this critical issue, Meituan’s business managers allocate a specific budget to incentivize crowdsourcing drivers to accept more orders. This research proposes a framework to tackle the multi-stage bonus allocation problem for meal delivery platforms, aiming to maximize the number of accepted orders within a limited bonus budget.
Cancellations due to unaccepted orders (NA-canceled orders) are a primary cause of negative ratings for the platform. Statistics show that Meituan receives approximately 30,000 negative reviews daily, with over 55% stemming from NA-canceled orders. Furthermore, around 165,000 NA-canceled orders occur each day, leading to reduced income for crowdsourcing drivers, increased food waste for restaurants, and a damaged platform reputation. The meal delivery platform is responsible for compensating restaurants for food waste caused by such cancellations, amounting to billions of RMB annually. Historical data analysis reveals two main reasons for NA-canceled orders: first, the delivery price for some orders is not attractive enough to drivers, even when driver supply is sufficient; second, in certain situations, such as stormy weather, there might be an insufficient number of online drivers to fulfill incoming orders. This paper primarily focuses on the first scenario, encouraging drivers to accept more orders through incentives, while the second case, driven by extreme weather, is beyond its scope.
Methodology Interpretation
This paper introduces a Multi-Stage Bonus Allocation (MSBA) framework designed to tackle the bonus allocation challenge on meal delivery platforms, with the overarching goal of maximizing accepted orders within a constrained budget. The framework comprises four key components: an acceptance and cancellation model, a Lagrangian Dual-based Dynamic Programming (LDDP) algorithm, an online allocation algorithm, and a periodic control strategy.
First, the acceptance and cancellation model forms the foundation of the framework. The acceptance model is tasked with forecasting the relationship between the bonus assigned to an order and its probability of being accepted. The paper employs a semi-black-box predictive model, assuming that the acceptance probability adheres to a logistic function form: 𝑝𝑖,𝑡 (𝑐𝑖,𝑡 ) = 1 / (1 + 𝑒^(𝛼𝑖,𝑡 𝑐𝑖,𝑡 +𝛽𝑖,𝑡)). Here, 𝛼𝑖,𝑡 and 𝛽𝑖,𝑡 are derived using machine learning models, such as neural networks. The input features for this forecasting model are divided into two categories: 𝑐𝑖,𝑡, the bonus allocated to each order, and 𝒙 𝒊,𝒕, contextual features. These contextual features encompass intrinsic attributes of the order, including the geographical locations of the customer and restaurant, the time difference between the current moment and the user’s order placement, estimated time of arrival (ETA), the impact of driver supply and demand dynamics, and drivers’ spatial information (e.g., the number of drivers within a 2-kilometer radius of the restaurant). The model’s training set is constructed from historical observations, with 𝛼𝑖,𝑡 and 𝛽𝑖,𝑡 learned simultaneously but through distinct hidden layers to address the uneven distribution of bonus-allocated order samples in the training data. Crucially, 𝛼𝑖,𝑡 is expected to be less than 0, aligning with the common understanding that a higher bonus leads to a greater acceptance probability. Concurrently, the cancellation probability 𝑞|𝑇| at each stage also influences decisions. The authors utilize the classic XGBoost model to train samples and predict the cancellation probability at each stage. This involves decomposing the predicted values, categorizing each 0.05 forecast value interval, and then frequently sampling each order type to determine the proportion of positive samples within each interval, which is then considered the actual cancellation probability for orders in that region.
Second, the Lagrangian Dual-based Dynamic Programming (LDDP) algorithm serves as the framework’s cornerstone for offline optimization. Recognizing that Problem (3) represents a non-linear, non-convex multi-stage optimization challenge intractable in practice, the authors propose decomposing it into two sub-problems. Initially, the total budget is allocated across each stage. Subsequently, within each stage, the optimal bonus for individual orders is calculated. The former is managed by dynamic programming, while the latter is a standard single-stage allocation problem solved using Lagrangian dual theory. This algorithm, by computing the empirical Lagrangian multiplier 𝜆𝑡∗ for each allocation stage, provides essential parameters for online allocation.
Third, the online allocation algorithm leverages the optimized parameters 𝜆𝑡∗ obtained from the offline phase to compute the appropriate delivery bonus for each order in real-time, within milliseconds. This algorithm transforms the problem into a series of separable minimization problems, ensuring the efficiency and real-time responsiveness of online decisions, with an online computational complexity of O(1).
Finally, periodic control strategies are implemented to dynamically adjust the remaining budget and order set, thereby ensuring that costs adhere to the total budget constraint. These strategies include executing the offline decision-making system daily, calculating the target offline training budget based on the previous month’s order data. Additionally, simple real-time expenditure control methods are adopted: if the real-time expenditure exceeds 110% of the total budget, bonuses are proportionally reduced; if it falls below 90%, bonuses are increased. The ratio of upward and downward adjustments is positively correlated with the difference between real-time and total budget, guaranteeing that online real-time expenditure remains within an acceptable range of the predefined budget.
Core Findings
This research provides robust validation, through both offline experiments and online A/B tests, of the significant effectiveness and efficiency of the Multi-Stage Bonus Allocation (MSBA) framework.
The offline experiments were conducted using real-world datasets from the Meituan meal delivery platform, encompassing diverse cities like Lanzhou, Nanchang, Weihai, and Chengdu, each representing different order scales over a week. Results demonstrated that, compared to scenarios without any bonus allocation, the MSBA framework achieved over a 60% reduction in canceled orders. More notably, when contrasted with the unified bonus mechanism and single-stage bonus allocation methods, MSBA decreased canceled orders by approximately 20% and 40%, respectively. In contexts with larger order volumes, MSBA consistently outperformed these alternative approaches. Further analysis of the total bonus allocated at each stage revealed a trend where allocations initially increased before subsequently decreasing as allocation stages progressed. This pattern is primarily attributed to the majority of orders being accepted promptly during the initial allocation stage, with the number of accepted orders diminishing in later stages. The initial increase in bonus allocation aims to incentivize more “Type B orders” (i.e., those with lower acceptance probabilities), while the subsequent decrease in total bonus allocation is due to a reduction in remaining orders. The study also observed that increasing the number of allocation stages correlates with a rise in accepted orders; however, beyond 10 stages, the improvement rate diminishes, and excessively frequent bonus adjustments negatively impact crowdsourcing driver experience. Consequently, balancing order acceptance rates with driver experience suggests that an optimal number of allocation stages should not exceed 10.
Online A/B tests were meticulously carried out across five Chinese cities, covering 120 delivery zones and managing 4.96 million orders daily. Orders were randomly and equally divided into three groups, each employing multi-stage, single-stage, and unified allocation methods. Operating under identical budget constraints (0.2 RMB per order), the MSBA framework once again proved its superiority. The NA-canceled order ratio decreased by over 25% compared to the single-stage allocation method and more than 29% when compared to the unified allocation method. Beyond operational efficiency, a compelling finding was the framework’s ability to save over 30% of the compensation paid to restaurants for food waste. These empirical results emphatically underscore MSBA’s substantial potential to enhance platform operational efficiency, reduce costs, and elevate the overall user experience.
Criticism and Limitations
Despite the significant achievements of the proposed Multi-Stage Bonus Allocation (MSBA) framework in meal delivery platforms, there are several criticisms and limitations that warrant discussion.
Firstly, the paper explicitly states that its primary focus is on enhancing order acceptance rates through bonus incentives in situations where there is an ample supply of drivers but a lack of order attractiveness. The second scenario, where driver shortages are caused by factors such as extreme weather, is explicitly excluded from the study’s scope. This narrow focus inherently limits the applicability of the model to a specific set of circumstances. In the real world, meal delivery platforms frequently encounter issues of driver supply-demand imbalance, particularly during adverse weather conditions or peak hours. If the model cannot effectively address these complex situations, its capacity for comprehensive optimization of platform operations will be restricted. Future research could explore integrating supply-demand equilibrium and extreme weather factors into a multi-stage bonus allocation decision framework to develop more robust and universally applicable solutions.
Secondly, while the “semi-black-box acceptance probability model” mentioned in the paper offers a degree of flexibility, its internal mechanisms and parameters (𝛼𝑖,𝑡 and 𝛽𝑖,𝑡) are derived from machine learning models like neural networks. This reliance can lead to relatively low interpretability of the model. In practical applications, business stakeholders may require more intuitive and easily understandable rules for strategy adjustments. A purely black-box model might face challenges in terms of trust and adaptability in certain decision-making scenarios. Furthermore, the model’s dependence on contextual features implies that its performance is susceptible to the quality of feature engineering and the real-time availability of data. If these features fail to accurately and promptly reflect actual conditions, the predictive accuracy of the model could be significantly compromised.
Thirdly, the LDDP algorithm computes empirical Lagrangian multipliers during an offline phase, and these multipliers are then utilized for real-time online decision-making. While this offline-online separation strategy guarantees millisecond-level efficiency for online decisions, the accuracy and timeliness of the offline computation are paramount. Should the historical dataset prove insufficient to capture dynamic market changes, or if the model update frequency is inadequate to respond to rapidly evolving market environments, the offline-derived multipliers may not optimally guide online allocations, thereby affecting overall effectiveness. Additionally, while discretizing the budget simplifies algorithmic complexity, it may also, to some extent, compromise the precision of the optimal solution.
Finally, while the periodic control strategy enables dynamic adjustment of the budget and order set, its adjustment ratio is positively correlated with the difference between real-time expenditure and the total budget. This linear adjustment mechanism might be overly simplistic; in complex dynamic environments, market responses are often non-linear. For example, in extreme scenarios of severe budget deficit or surplus, a simple proportional adjustment may not achieve optimal results and could even introduce new problems. A more refined, adaptive control mechanism could potentially further enhance the framework’s robustness and decision quality.
Practical Implications
The Multi-Stage Bonus Allocation (MSBA) framework proposed in this research offers significant practical implications for meal delivery platforms and the broader on-demand service sector. Its core value lies in its ability to effectively increase order acceptance rates, reduce cancellation rates, and substantially decrease platform compensation costs resulting from canceled orders, all through sophisticated dynamic bonus strategies.
Firstly, the importance of refined multi-stage bonus strategies is highlighted. Traditional unified bonus or single-stage allocation methods are often inadequate in adapting to the dynamic changes in acceptance probabilities throughout an order’s lifecycle. The MSBA framework addresses this by segmenting an order’s lifecycle into multiple decision stages and dynamically adjusting bonuses based on the specific characteristics of each order at each stage (e.g., waiting time, supply-demand status, geographical location). This suggests that platforms should abandon “one-size-fits-all” bonus policies in favor of more adaptive and forward-looking multi-stage incentive mechanisms. For instance, for orders that remain unaccepted for extended periods, bonuses can be gradually increased, though platforms must be wary of excessively high bonuses creating a negative incentive for drivers to “wait for bonuses.”
Secondly, data-driven prediction and optimization are foundational. The MSBA framework heavily relies on accurate predictive models for acceptance and cancellation probabilities. This implies that meal delivery platforms must invest resources in building robust data analytics and machine learning capabilities to capture the intricate relationships between orders, drivers, environmental factors, and historical data. By utilizing semi-black-box models (e.g., combining neural networks and XGBoost), platforms can gain a better understanding of how bonuses influence driver behavior and optimize decisions accordingly. For supply chain practitioners, this underscores the necessity of integrating big data analytics and AI technologies into operational decisions, transitioning from experience-driven to data-intelligence-driven approaches.
Thirdly, the integration of offline optimization with online real-time decision-making is a key insight. The LDDP algorithm’s offline training enables it to learn optimal Lagrangian multipliers from historical data, while the online allocation algorithm can then leverage these offline-derived parameters for real-time decisions with O(1) complexity. This architectural design perfectly balances decision precision with real-time responsiveness. For supply chain scenarios requiring rapid responses to market fluctuations, such as instant retail and fresh produce delivery, this offline-online hybrid optimization model offers broad applicability. It encourages enterprises to conduct complex global optimizations in the backend while deploying lightweight, efficient decision engines at the frontend.
Furthermore, the practical value of periodic budget control is evident. Facing the challenge of a limited monthly budget and uncertain future order conditions, MSBA’s periodic control strategy provides a flexible solution for budget management. By daily adjusting the target budget for offline training and dynamically modifying bonus ratios based on real-time expenditure, the platform can effectively manage overall costs while maximizing benefits. This is highly relevant for any dynamic pricing or incentive system constrained by budgets, particularly in supply chain cost management, where it can help companies achieve more efficient capital utilization without compromising service quality.
Finally, the model’s universality and expansion potential are noteworthy. The paper indicates that the proposed algorithm can also be applied to similar time-series pricing problems, such as designing discount strategies for expiring products in supermarkets or formulating pricing strategies for perishable goods. This opens new avenues for supply chain managers in other industries, inspiring them to apply multi-stage optimization and dynamic pricing concepts to their own business scenarios, thereby improving inventory turnover efficiency, reducing losses, and optimizing revenue. For example, in retail supply chains, dynamic pricing for seasonal or near-expiration products can effectively reduce inventory buildup and waste.
Paper Citation
Wu, Z., Wang, L., Huang, F., Zhou, L., Song, Y., Ye, C., Nie, P., Ren, H., Hao, J., He, R., & Sun, Z. (2022). A Framework for Multi-stage Bonus Allocation in Meal Delivery Platform. In 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), August 14-18, 2022, Washington. ACM, New York, NY, USA, 9 pages. DOI: https://doi.org/10.1145/3534678.3539095
Source: https://dl.acm.org/doi/10.1145/3534678.3539095
This article was AI-assisted, based on academic paper analysis, for reference purposes only.







