Double Deep Q-Learning Network for Package Pickup and Delivery Route Prediction

Metages Molla Gubena, Yuli Zhang*, Ziqian Ma, Xiaoyu Ma, Yucai Yang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Package pickup and delivery services are essential for real-time logistics, where riders must navigate operational constraints (e.g., delivery deadlines, pickup-delivery precedence), personalized routing preferences, and dynamic demand fluctuations. Existing approaches exhibit critical limitations: combinatorial optimization (CO) often ignores rider preferences; deep learning (DL) struggles to enforce constraints; and deep reinforcement learning (DRL) faces sparse rewards and training instability. To address these challenges, we propose a novel hybrid framework—the double deep Q-learning network for package pickup and delivery route prediction (DDQN-PPDRP)—that integrates double deep Q-networks with constraint-aware optimization. Our method introduces three novel components: (1) an ensemble network combining a traveling salesman model with time windows for constraint satisfaction and an enhanced DeepRoute transformer (i-DeepRoute) with precedence-aware masking to model rider-specific routing behavior; (2) dual-reward functions incorporating complete match rate (CMR) for strict route alignment and edit distance (ED) for incremental feedback, alleviating reward sparsity; and (3) adaptive prioritized experience replay (APER) with double-buffer strategy to stabilize training. Evaluated on real-world data from Dada Express, DDQN-PPDRP achieves significant improvements over state-of-the-art baselines: a 29.14% reduction in location square deviation, 48.33% lower ED, 19.20% higher CMR, and 6.67% improved rank correlation. It also reduces network parameters by 85% compared to pure DL methods and cuts training time by 57.7% through APER. These results demonstrate that DDQN-PPDRP effectively balances operational constraints, rider preferences, and computational efficiency, harmonizing CO’s interpretability with DRL’s adaptability to advance scalable, constraint-compliant route prediction in dynamic logistics environments.

Original languageEnglish
JournalIEEE Transactions on Consumer Electronics
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • adaptive prioritized experience replay
  • Double deep Q-learning network
  • dual-reward function
  • ensemble learning
  • pickup and delivery route prediction

Fingerprint

Dive into the research topics of 'Double Deep Q-Learning Network for Package Pickup and Delivery Route Prediction'. Together they form a unique fingerprint.

Cite this