跳到主要导航 跳到搜索 跳到主要内容

Double Deep Q-Learning Network for Package Pickup and Delivery Route Prediction

  • Metages Molla Gubena
  • , Yuli Zhang*
  • , Ziqian Ma
  • , Xiaoyu Ma
  • , Yucai Yang
  • *此作品的通讯作者
  • Beijing Institute of Technology
  • Debre Markos University
  • Ant Smart Information Technology (Shanghai) Company Ltd.
  • Beijing Foreign Studies University
  • JD Group’s Dada Business Division

科研成果: 期刊稿件文章同行评审

摘要

Package pickup and delivery services are essential for real-time logistics, where riders must navigate operational constraints (e.g., delivery deadlines, pickup-delivery precedence), personalized routing preferences, and dynamic demand fluctuations. Existing approaches exhibit critical limitations: combinatorial optimization (CO) often ignores rider preferences; deep learning (DL) struggles to enforce constraints; and deep reinforcement learning (DRL) faces sparse rewards and training instability. To address these challenges, we propose a novel hybrid framework—the double deep Q-learning network for package pickup and delivery route prediction (DDQN-PPDRP)—that integrates double deep Q-networks with constraint-aware optimization. Our method introduces three novel components: (1) an ensemble network combining a traveling salesman model with time windows for constraint satisfaction and an enhanced DeepRoute transformer (i-DeepRoute) with precedence-aware masking to model rider-specific routing behavior; (2) dual-reward functions incorporating complete match rate (CMR) for strict route alignment and edit distance (ED) for incremental feedback, alleviating reward sparsity; and (3) adaptive prioritized experience replay (APER) with double-buffer strategy to stabilize training. Evaluated on real-world data from Dada Express, DDQN-PPDRP achieves significant improvements over state-of-the-art baselines: a 29.14% reduction in location square deviation, 48.33% lower ED, 19.20% higher CMR, and 6.67% improved rank correlation. It also reduces network parameters by 85% compared to pure DL methods and cuts training time by 57.7% through APER. These results demonstrate that DDQN-PPDRP effectively balances operational constraints, rider preferences, and computational efficiency, harmonizing CO’s interpretability with DRL’s adaptability to advance scalable, constraint-compliant route prediction in dynamic logistics environments.

源语言英语
页(从-至)7503-7522
页数20
期刊IEEE Transactions on Consumer Electronics
71
3
DOI
出版状态已出版 - 2025
已对外发布

指纹

探究 'Double Deep Q-Learning Network for Package Pickup and Delivery Route Prediction' 的科研主题。它们共同构成独一无二的指纹。

引用此