How Meituan Uses Gaussian Mixture Models to Optimize Food De

1. Research Background: The Challenge of Uncertainty in Food Delivery

In today’s rapidly evolving on-demand delivery industry, food delivery platforms face massive order scheduling decisions every day. Platforms like Meituan, Deliveroo, and DoorDash must complete the entire process from order acceptance to dispatch and delivery within minutes. However, the real world is full of uncertainties—fluctuations in restaurant preparation times, changes in rider traffic conditions, and customer reception delays all make delivery optimization exceptionally complex.

Among these variables, service time (the duration from a rider’s arrival at the restaurant to pickup and departure) is a critical yet difficult-to-predict factor. Traditional methods often estimate service time using fixed values or simple statistics, but this ignores its inherent randomness and multi-modal distribution characteristics. A research team from Tsinghua University’s Department of Automation, in collaboration with Meituan, has proposed a Gaussian Mixture Model-based approach to service time modeling, offering a new solution to this problem.

This research’s core contributions are: first application of Gaussian Mixture Models (GMM) to food delivery service time modeling, proposal of Hybrid Estimation of Distribution Algorithm (HEDA) for efficient GMM parameter solving, and online A/B testing validation on Meituan’s real platform. Results show that introducing the uncertainty model significantly improved overall delivery efficiency, shortened riders’ average delivery time, and enhanced customer satisfaction.

2. Problem Definition: Formal Modeling of Stochastic Service Time

Food delivery service time is influenced by multiple factors: restaurant type (fast food vs. full service), time of day (peak vs. off-peak), weather conditions, rider experience, etc. These factors cause service time to exhibit complex multi-modal distribution characteristics—weekday lunch peaks follow one pattern, weekend dinners another, and rainy days yet another.

Objective Function: Maximize log-likelihood of service time distribution estimation

$$max_{theta} sum_{i=1}^{N} log left( sum_{k=1}^{K} pi_k cdot mathcal{N}(x_i | mu_k, sigma_k^2) right)$$

where $theta = {pi_k, mu_k, sigma_k^2}_{k=1}^K$ are GMM parameters, $pi_k$ is the weight of the $k$-th Gaussian component, and $mathcal{N}$ is the Gaussian distribution function.

Constraints:

Weights sum to 1: $sum_{k=1}^K pi_k = 1$
Weights non-negative: $pi_k geq 0$
Variance positive: $sigma_k^2 > 0$

This problem faces three challenges: first, optimal component number $K$ is unknown and needs automatic determination; second, the objective function is non-convex with multiple local optima; third, online real-time prediction is required, demanding high computational efficiency.

3. Methodology: Gaussian Mixture Models and Hybrid Estimation of Distribution Algorithm

Gaussian Mixture Model (GMM) is a probabilistic model assuming data is composed of multiple Gaussian distributions combined together. The research team transformed the service time distribution estimation problem into a clustering problem, learning GMM parameters by determining the probability that each data point belongs to each component. This approach’s advantage is not requiring pre-assumption that service time follows a specific distribution; instead, it lets data automatically discover the most suitable distribution form.

To efficiently solve for GMM parameters, the team proposed Hybrid Estimation of Distribution Algorithm (HEDA), containing four key innovations:

1. Problem-Specific Encoding and Decoding Methods: Researchers designed an encoding scheme specifically tailored for clustering problems, transforming complex parameter optimization into more manageable representations. This encoding ensures solution feasibility while simplifying the search space.

2. Chinese Restaurant Process (CRP)-Based Initialization Mechanism: CRP is a non-parametric Bayesian method that automatically determines cluster numbers rather than requiring pre-specification. Through CRP initialization, the algorithm generates high-quality initial solutions, laying a good foundation for subsequent optimization.

3. Weighted Learning Mechanism: During algorithm iteration, solutions of different qualities contribute differently to probability model updates. The weighted learning mechanism effectively utilizes information from high-quality solutions, guiding search in better directions.

4. Maximum Likelihood-Based Local Intensification: Building on global search, the algorithm incorporates a local search mechanism that further exploits high-quality solution neighborhoods through maximum likelihood estimation, improving solution precision.

Compared to traditional EM algorithms, HEDA’s advantages include: (1) automatic determination of component number K without manual tuning; (2) strong global search capability,不易 falling into local optima; (3) high computational efficiency, suitable for large-scale data.

4. Experimental Validation: Offline Testing and Online A/B Testing

Offline Experiment Design: The team used real delivery data from Meituan’s June 2021 operations for validation, containing service time records from approximately 5 million orders. Data was split into training set (first 3 weeks) and test set (last week). Baseline methods include: (1) single Gaussian model; (2) fixed-K GMM (K=3,5,7); (3) histogram estimation.

Main Results: HEDA algorithm outperformed baselines across multiple metrics. Bayesian Information Criterion (BIC) scores were 15.3% lower than the best baseline, indicating better model fit; log-likelihood improved by 12.7%, indicating more accurate probability estimation; Mean Absolute Error (MAE) decreased from 4.2 minutes to 3.1 minutes, a 26% improvement in prediction accuracy.

Online A/B Testing: In July 2021, Meituan conducted a three-week A/B test across 3 cities. The experimental group used the GMM-based uncertainty model to assist order dispatching decisions, while the control group used traditional deterministic methods. Results showed: experimental group riders’ average delivery time shortened by 8.5%, order on-time rate improved by 6.2 percentage points, and customer satisfaction increased by 3.8 percentage points.

Case Study: The team conducted an in-depth analysis of a typical case—weekday lunch peak in a commercial district. Traditional methods estimated service time as 8 minutes (fixed value), but the GMM model identified two distinct patterns: fast food restaurants (mean 5 minutes, weight 60%) and full-service restaurants (mean 12 minutes, weight 40%). Based on this insight, the dispatch system allocated tighter delivery time windows for fast food orders and more relaxed windows for full-service orders, improving overall delivery efficiency by 11%.

5. Critique and Limitations: Rational Academic Perspective

1. Research Assumption Limitations: GMM assumes service time follows a mixture of multiple Gaussian distributions, but under certain extreme scenarios (severe weather, unexpected events), service time distribution may severely deviate from Gaussian assumptions, exhibiting long-tail or skewed distributions. Additionally, the model assumes service time distribution patterns are stable in the short term, but in reality, they may drift due to restaurant menu changes, rider turnover, and other factors.

2. Methodological Boundary Conditions: HEDA’s computational complexity is relatively high; although advantageous compared to traditional EM algorithms, for delivery platforms requiring millisecond-level real-time decisions, balancing accuracy and efficiency remains necessary. The study adopted an offline training plus online lookup strategy to mitigate this, but the offline model update frequency (weekly) may not timely capture distribution changes.

3. Experimental Design Shortcomings: Offline experiments used historical data with selection bias—only observing service times of occurred orders, not “counterfactual” scenarios (what if different dispatch strategies were used). Online A/B testing occurred in only 3 cities with limited sample representativeness, and the three-week test period leaves long-term effects unknown (e.g., behavioral changes after riders form adaptations).

4. External Validity Concerns: This study uses data from China’s largest food delivery platform; whether conclusions generalize to other scenarios is uncertain. For instance, Western delivery platforms (UberEats, DoorDash) may have different restaurant types, delivery distances, and rider models; fresh food delivery and express logistics have different timeliness requirements and service time distribution characteristics.

6. Practical Implications: Implementation Guide for Supply Chain Practitioners

1. Technical Implementation Path:

Data Preparation: Requires at least 1 month of historical delivery data with fields: order ID, restaurant ID, rider ID, arrival time at restaurant, pickup departure time, restaurant type, time of day, weather conditions, etc. Recommended minimum 500,000 orders for GMM training statistical significance.
Technology Stack: Python 3.8+ (data processing), scikit-learn 1.0+ (GMM baseline implementation), PyTorch 1.9+ (custom HEDA algorithm), Redis (online lookup caching). Server specifications: 16-core CPU, 64GB RAM, supporting 50,000+ service time prediction requests per second.
Implementation Steps: Step 1: Clean historical data, removing anomalies (service time 60 minutes); Step 2: Train GMM model, select optimal K using BIC criterion; Step 3: Validate model prediction accuracy (MAE target <4 minutes); Step 4: Deploy online service, integrate into dispatch system; Step 5: Set up A/B testing, validate before full deployment.

2. Implementation Cost and ROI Estimation:

Development Cost: Requires 1 algorithm engineer (2 months), 1 backend engineer (1 month), 1 data engineer (2 weeks). At tier-1 city salaries, labor costs approximately 500,000-800,000 RMB.
Operational Cost: Server costs about 10,000-20,000 RMB/month, weekly model retraining requires additional 5,000 RMB in compute resources.
Expected Returns: For a delivery platform with 100 million monthly orders, 8.5% average delivery time reduction saves approximately 8.5 million RMB/month in rider costs (at 8 RMB/order). Minus development and operational costs, net benefit is about 8 million RMB/month. Investment payback period: 1-2 months.

3. Applicable Scenarios and Enterprise Types:

High-Applicability Scenarios: On-demand delivery (food, groceries, pharmaceuticals), ride-hailing dispatch, sharing economy platforms, dynamic service pricing.
Enterprise Scale Recommendation: Medium-to-large enterprises with 100,000+ daily orders are more suitable. Small enterprises with low order volumes lack sufficient historical data for effective GMM models; simplified rules are recommended.
Inapplicable Scenarios: Highly deterministic service time scenarios (standardized product delivery), extremely low order volume scenarios, industries with strict regulations prohibiting differentiated services.

4. Implementation Risks and Mitigation:

Model Drift Risk: Service time distributions may change over time. Mitigation: Weekly model retraining, set drift detection alerts (e.g., KS test).
Fairness Concerns: Different restaurants/riders receiving different time windows may cause dissatisfaction. Mitigation: Transparent rules (e.g., publish service time calculation formulas), set reasonable upper and lower bounds.
System Stability Risk: Online service failures may interrupt dispatch. Mitigation: Implement degradation strategies (switch to fixed rules during failures), multi-active deployment, real-time monitoring.

7. Paper Citation

Title: Modeling Stochastic Service Time for Complex On-Demand Food Delivery

Authors: Jie Zheng, Ling Wang (Department of Automation, Tsinghua University); Xuetao Ding, Shengyao Wang, Jing-fang Chen, Xing Wang, Haining Duan, Yile Liang (Meituan)

Venue:

Journal: Complex & Intelligent Systems
Year: 2022
Volume: Vol. 8, pp. 4939-4953

Links:

DOI: 10.1007/s40747-022-00719-4
Springer Link: https://link.springer.com/article/10.1007/s40747-022-00719-4

Impact:

Google Scholar Citations: Approximately 65 citations as of February 2026
Industry Application: Fully deployed on China’s largest food delivery platform, processing 30 million daily orders
Academic Impact: Cited by top transportation journals like Transportation Research Part C

Explore

Logistics

Regions

How Meituan Uses Gaussian Mixture Models to Optimize Food Delivery: New Research from Tsinghua University

Related Posts

Meituan Cuts Order Cancellations by 25% with AI Bonus Framework

A Framework for Multi-Stage Bonus Allocation in Meal Delivery Platforms: Operationalizing Real-Time Incentive Optimization at Scale

Battery-Swapping Heavy Trucks in Thailand: A Supply Chain Inflection Point for ASEAN Electrification

Maersk: Latin America’s New Consumer Dynamics Reshape Logistics, Aging Accelerates Supply Chain Restructuring

CSDDD Unleashed: How the EU’s Hard Law Directive Is Forcing Global Supply Chain Reengineering

A Multi-stage Bonus Allocation Framework for Meal Delivery Platforms

Leave a Reply Cancel reply

Recommended

Transtex Acquires DClimate: Significant M&A Move in Transportation Industry

IFS Completes Softeon Acquisition: 20-Year WMS Expertise Reshapes Supply Chain Software in 2026

Operational Risk in Supply Chains: How ePost Global and ShipWise Are Tackling Rising Returns

Physical AI: The Next Great Manufacturing Shift Transforming Factory Operations

SCI.AI

Categories

Welcome Back!

Create New Account!

Retrieve your password

Scan to share via WeChat

Add New Playlist