Explore

  • Trending
  • Latest
  • Tools
  • Browse
  • AI Assistant
  • Subscription Feed

Logistics

  • Ocean
  • Air Cargo
  • Road & Rail
  • Warehousing
  • Last Mile

Regions

  • Southeast Asia
  • South Asia
  • Central Asia
  • Japan & Korea
  • Middle East
  • Europe
  • Russia
  • Africa
  • North America
  • Latin America
  • Australia
SCI.AI
  • Supply Chain
    • Strategy & Planning
    • Logistics & Transport
    • Manufacturing
    • Inventory & Fulfillment
  • Procurement
    • Strategic Sourcing
    • Supplier Management
    • Supply Chain Finance
  • Technology
    • AI & Automation
    • Robotics
    • Digital Platforms
  • Risk & Resilience
  • Sustainability
  • Research
  • Expert Columns
  • English
    • Chinese
    • English
No Result
View All Result
  • Login
  • Register
SCI.AI
No Result
View All Result
Home Research Papers

Transportation Science Paper: RL + Hyper-Heuristics Cut Meituan Meal Delivery Costs by 12%

2026/02/18
in Papers
0 0
Transportation Science Paper: RL + Hyper-Heuristics Cut Meituan Meal Delivery Costs by 12%

Transportation Science Paper: How Reinforcement Learning + Hyper-Heuristics Cut Meituan’s Meal Delivery Costs by 12%

Meal delivery looks simple — accept order, pick up, deliver — but at Meituan’s scale of 30+ million daily orders, it’s one of the most complex real-time combinatorial optimization problems on Earth. Up to 260 new orders per minute, tens of thousands of couriers moving through cities, each dispatch decision altering the entire system’s future state. Traditional greedy dispatching (always assign the nearest courier) seems reasonable but ignores a critical fact: today’s “optimal” can cause tomorrow’s disaster.

A research team from Amazon, Universidad Católica del Norte, Universidad Adolfo Ibáñez, and the University of Sydney — Ramón Auad, Felipe Lagos, and Tomás Lagos — submitted a paper to Transportation Science (one of the top journals in transportation research) proposing a hybrid framework combining reinforcement learning with hyper-heuristic optimization, validated using Meituan’s real operational data. Result: 12% cost reduction through “strategic order postponement,” with the largest improvements during peak hours with courier shortages.

Core Problem: Why “Nearest Courier Takes Nearest Order” Is a Terrible Strategy

Meal delivery platforms face two core decisions: Dispatching — which courier takes which order; and Routing — in what sequence does the courier pick up and deliver. These problems are deeply coupled, dynamically changing, and fraught with uncertainty — mathematically NP-hard.

Traditional methods treat each time window independently, minimizing current-period costs. The problem: this “myopic” strategy ignores the long-term effects of sequential decisions. Sending a courier to a distant delivery removes them from covering urgent new orders in their zone over the next 5 minutes. Waiting might bring a closer courier into range, or allow bundling of directionally similar orders.

The paper formalizes this as a Sequential Decision Process, explicitly modeling dynamic system state evolution. Each dispatch decision has both immediate cost and downstream effects on courier distribution and order wait times. This modeling makes “don’t dispatch now — wait for a better match” a legitimate, evaluable strategy.

Technical Approach: n-step SARSA + Multi-Armed Bandit Hyper-Heuristic

The framework consists of two layers, elegantly solving the “action space explosion” problem that plagues RL in combinatorial optimization:

Upper layer: n-step SARSA reinforcement learning. Unlike the better-known Q-learning, SARSA learns the value function under the current policy rather than the optimal policy — more suitable for meal delivery, which demands conservative, stable policies. The n-step extension enables the algorithm to see multi-step future rewards. The researchers use linear value function approximation for scalability — neural networks might be more precise, but at 260 orders per minute, inference speed must be prioritized.

Lower layer: Multi-Armed Bandit (MAB) hyper-heuristic. The paper’s most original design. At each decision point, the system faces not a simple “A or B” choice but must find good solutions among tens of thousands of possible courier-order matching combinations. The authors designed 7 specialized low-level heuristics (nearest-match, load-balancing, delay-tolerant, etc.), then used a MAB algorithm to dynamically select the most appropriate heuristic for the current system state. This “choosing which heuristic to use” strategy — called hyper-heuristic — avoids the computational disaster of searching directly in the enormous action space.

Simulation Environment: Rebuilding the Delivery World with Meituan’s Real Data

Another major contribution is the high-fidelity simulation environment built from Meituan’s actual operational data, capturing multiple critical real-world features:

  • Order dynamics: Arrival patterns follow real temporal patterns — lunch peak vs. dinner peak, weekday vs. weekend distributions
  • Courier behavior: Couriers aren’t robots. They reject certain orders (especially long-distance, bad weather), have zone preferences, and vary in online/offline timing. ML models predict order acceptance probability
  • Stochastic service times: Restaurant preparation time and delivery time (traffic, building floors) are random variables, modeled with gradient boosting trees
  • Time window constraints: Each order has a promised delivery time; violations mean compensation and rating penalties

Notably, the researchers honestly acknowledged a limitation: due to computational constraints, experiments ran on scaled-down instances rather than Meituan’s full-scale operations. Extending the framework to 260-orders-per-minute full scale remains a future research direction — academic honesty rare and valuable in industry-partnered papers.

Key Findings: More Than an Algorithmic Victory

1. “Strategic order postponement” delivers 12% cost reduction. The most counterintuitive finding: not always dispatching immediately is more efficient than immediate dispatch. The algorithm learned to deliberately wait in certain situations — for new couriers entering a zone, for directionally similar new orders enabling bundled delivery, for naturally easing pressure in overloaded areas.

2. Greatest improvements during peak + courier shortage. When couriers are abundant, any algorithm performs well — surplus supply means ample choices. The real differentiation occurs in extreme scenarios: lunch peak 11:30-13:00, dinner peak 17:30-20:00, bad weather causing courier dropoff. In these scenarios, myopic strategies create “cascading failures” (send courier far → zone under-served → more timeouts → forced expediting → costs spike) that RL effectively prevents.

3. Adding 10% more couriers beats algorithmic improvements. Perhaps the paper’s most practically valuable finding. A 10% increase in courier availability yields greater cost reduction than upgrading from baseline algorithms to the RL framework. For delivery platforms, fleet supply management (recruitment, incentives, retention) may deliver higher ROI than dispatch algorithm optimization. Algorithms and fleet capacity are complementary, not substitutes — the optimal strategy invests in both.

Strategic Implications for the Logistics Industry

1. “Delayed decisions” are an undervalued optimization lever. Under instant-delivery pressure, operations teams default to “dispatch as fast as possible.” This paper proves that under the right conditions (bundling opportunities, new resources arriving soon), disciplined waiting outperforms hasty action. This principle extends beyond food delivery to parcel sorting, ride-hailing dispatch, and warehouse task assignment.

2. Extreme-scenario performance is the real competitive differentiator. All competitors provide adequate service during normal periods. What determines user retention and brand reputation is service quality during peaks, bad weather, and incidents. Concentrating algorithmic resources on extreme scenarios may deliver more business value than pursuing across-the-board average improvement.

3. Capacity is the primary productive force. Algorithms cannot conjure couriers from thin air. No matter how sophisticated the algorithm, severe courier shortage leaves limited optimization room. For instant-delivery platforms, courier recruitment and retention strategies should share equal priority with technology investment. The smartest approach: use algorithmic optimization to improve courier experience (better routes, less dead mileage), which in turn improves retention.

4. Simulation is the bridge to production. The high-fidelity simulation environment is itself a major asset. Validating RL algorithms in simulation before deployment avoids the risk of “experimenting” on real orders. For any company considering AI in logistics operations, the first investment should be building the most realistic simulation environment possible, not deploying models directly.

Source: Auad, R., Lagos, F., & Lagos, T. “Data-Driven Optimization for Meal Delivery: A Reinforcement Learning Approach for Order-Courier Assignment and Routing at Meituan.” Submitted to Transportation Science. | Amazon / Universidad Católica del Norte / Universidad Adolfo Ibáñez / University of Sydney | First INFORMS TSL Data-Driven Research Challenge

More on This Topic

  • Meituan Cuts Order Cancellations by 25% with AI Bonus Framework (Mar 30, 2026)
  • **A Framework for Multi-Stage Bonus Allocation in Meal Delivery Platforms: Operationalizing Real-Time Incentive Optimization at Scale** (Mar 26, 2026)
  • Maersk: Latin America’s New Consumer Dynamics Reshape Logistics, Aging Accelerates Supply Chain Restructuring (Mar 19, 2026)
  • CSDDD Unleashed: How the EU’s Hard Law Directive Is Forcing Global Supply Chain Reengineering (Mar 19, 2026)
  • A Multi-stage Bonus Allocation Framework for Meal Delivery Platforms (Mar 17, 2026)
ShareTweet

Related Posts

Meituan Cuts Order Cancellations by 25% with AI Bonus Framework
Papers

Meituan Cuts Order Cancellations by 25% with AI Bonus Framework

March 30, 2026
8
**A Framework for Multi-Stage Bonus Allocation in Meal Delivery Platforms: Operationalizing Real-Time Incentive Optimization at Scale**
Papers

**A Framework for Multi-Stage Bonus Allocation in Meal Delivery Platforms: Operationalizing Real-Time Incentive Optimization at Scale**

March 26, 2026
6
Maersk: Latin America’s New Consumer Dynamics Reshape Logistics, Aging Accelerates Supply Chain Restructuring
ESG & Regulation

Maersk: Latin America’s New Consumer Dynamics Reshape Logistics, Aging Accelerates Supply Chain Restructuring

March 19, 2026
9
CSDDD Unleashed: How the EU’s Hard Law Directive Is Forcing Global Supply Chain Reengineering
ESG & Regulation

CSDDD Unleashed: How the EU’s Hard Law Directive Is Forcing Global Supply Chain Reengineering

March 19, 2026
34
Papers

A Multi-stage Bonus Allocation Framework for Meal Delivery Platforms

March 17, 2026
23
How Meituan Uses Gaussian Mixture Models to Optimize Food Delivery: New Research from Tsinghua University
Papers

How Meituan Uses Gaussian Mixture Models to Optimize Food Delivery: New Research from Tsinghua University

February 27, 2026
138

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

亚马逊在配送车辆中部署人工智能技术:提升物流效率与客户体验的新机遇

Amazon Deploys AI Technology in Delivery Vehicles: New Opportunities for Enhancing Logistics Efficiency and Customer Experience

8 Views
February 15, 2026
让您的鳄梨完美送达:确保品质与新鲜的秘诀

Delivering Your Avocados Perfectly: Secrets to Ensuring Quality and Freshness

11 Views
February 15, 2026
Supply Chain Disruption: Middle East Conflict’s Devastating Impact on Global Ocean and Air Freight Networks

Supply Chain Disruption: Middle East Conflict’s Devastating Impact on Global Ocean and Air Freight Networks

15 Views
March 24, 2026
FedEx Q3 Earnings Show Resilience: Limited Supply Chain Impact from Middle East Conflict, Air Cargo Market Demonstrates Adaptability

FedEx Q3 Earnings Show Resilience: Limited Supply Chain Impact from Middle East Conflict, Air Cargo Market Demonstrates Adaptability

11 Views
March 23, 2026
Show More

SCI.AI

Global Supply Chain Intelligence. Delivering real-time news, analysis, and insights for supply chain professionals worldwide.

Categories

  • Supply Chain Management
  • Procurement
  • Technology

 

  • Risk & Resilience
  • Sustainability
  • Research

© 2026 SCI.AI. All rights reserved.

Powered by SCI.AI Intelligence Platform

Welcome Back!

Sign In with Facebook
Sign In with Google
Sign In with Linked In
OR

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Sign Up with Facebook
Sign Up with Google
Sign Up with Linked In
OR

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In

Scan to share via WeChat

Open WeChat and scan the QR code to share

QR Code

Add New Playlist

No Result
View All Result
  • Supply Chain
    • Strategy & Planning
    • Logistics & Transport
    • Manufacturing
    • Inventory & Fulfillment
  • Procurement
    • Strategic Sourcing
    • Supplier Management
    • Supply Chain Finance
  • Technology
    • AI & Automation
    • Robotics
    • Digital Platforms
  • Risk & Resilience
  • Sustainability
  • Research
  • Expert Columns
  • English
    • Chinese
    • English
  • Login
  • Sign Up

© 2026 SCI.AI