TY - JOUR
T1 - Trajectory Design for UAV-Based Internet of Things Data Collection
T2 - A Deep Reinforcement Learning Approach
AU - Wang, Yang
AU - Gao, Zhen
AU - Zhang, Jun
AU - Cao, Xianbin
AU - Zheng, Dezhi
AU - Gao, Yue
AU - Ng, Derrick Wing Kwan
AU - Renzo, Marco Di
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2022/3/1
Y1 - 2022/3/1
N2 - In this article, we investigate an unmanned aerial vehicle (UAV)-assisted Internet of Things (IoT) system in a sophisticated 3-D environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified 2-D scenario and the availability of perfect channel state information (CSI), this article considers a practical 3-D urban environment with imperfect CSI, where the UAV's trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired by the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAV's trajectory and we present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of the UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of the IoT nodes, the UAV's position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional nonlearning-based baseline methods.
AB - In this article, we investigate an unmanned aerial vehicle (UAV)-assisted Internet of Things (IoT) system in a sophisticated 3-D environment, where the UAV's trajectory is optimized to efficiently collect data from multiple IoT ground nodes. Unlike existing approaches focusing only on a simplified 2-D scenario and the availability of perfect channel state information (CSI), this article considers a practical 3-D urban environment with imperfect CSI, where the UAV's trajectory is designed to minimize data collection completion time subject to practical throughput and flight movement constraints. Specifically, inspired by the state-of-the-art deep reinforcement learning approaches, we leverage the twin-delayed deep deterministic policy gradient (TD3) to design the UAV's trajectory and we present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm. In particular, we set an additional information, i.e., the merged pheromone, to represent the state information of the UAV and environment as a reference of reward which facilitates the algorithm design. By taking the service statuses of the IoT nodes, the UAV's position, and the merged pheromone as input, the proposed algorithm can continuously and adaptively learn how to adjust the UAV's movement strategy. By interacting with the external environment in the corresponding Markov decision process, the proposed algorithm can achieve a near-optimal navigation strategy. Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional nonlearning-based baseline methods.
KW - Data collection
KW - Internet of Things (IoT)
KW - deep reinforcement learning (DRL)
KW - trajectory design
KW - unmanned aerial vehicle (UAV) communications
UR - http://www.scopus.com/inward/record.url?scp=85112628598&partnerID=8YFLogxK
U2 - 10.1109/JIOT.2021.3102185
DO - 10.1109/JIOT.2021.3102185
M3 - Article
AN - SCOPUS:85112628598
SN - 2327-4662
VL - 9
SP - 3899
EP - 3912
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 5
ER -