TY - GEN
T1 - Video2Reward
T2 - 27th European Conference on Artificial Intelligence, ECAI 2024
AU - Zeng, Runhao
AU - Zhou, Dingjie
AU - Liang, Qiwei
AU - Liu, Junlin
AU - Li, Hui
AU - Huang, Changxin
AU - Li, Jianqiang
AU - Hu, Xiping
AU - Sun, Fuchun
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024/10/16
Y1 - 2024/10/16
N2 - Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.
AB - Learning behavior in legged robots presents a significant challenge due to its inherent instability and complex constraints. Recent research has proposed the use of a large language model (LLM) to generate reward functions in reinforcement learning, thereby replacing the need for manually designed rewards by experts. However, this approach, which relies on textual descriptions to define learning objectives, fails to achieve controllable and precise behavior learning with clear directionality. In this paper, we introduce a new video2reward method, which directly generates reward functions from videos depicting the behaviors to be mimicked and learned. Specifically, we first process videos containing the target behaviors, converting the motion information of individuals in the videos into keypoint trajectories represented as coordinates through a video2text transforming module. These trajectories are then fed into an LLM to generate the reward function, which in turn is used to train the policy. To enhance the quality of the reward function, we develop a video-assisted iterative reward refinement scheme that visually assesses the learned behaviors and provides textual feedback to the LLM. This feedback guides the LLM to continually refine the reward function, ultimately facilitating more efficient behavior learning. Experimental results on tasks involving bipedal and quadrupedal robot motion control demonstrate that our method surpasses the performance of state-of-the-art LLM-based reward generation methods by over 37.6% in terms of human normalized score. More importantly, by switching video inputs, we find our method can rapidly learn diverse motion behaviors such as walking and running.
UR - http://www.scopus.com/inward/record.url?scp=85216679053&partnerID=8YFLogxK
U2 - 10.3233/FAIA241014
DO - 10.3233/FAIA241014
M3 - Conference contribution
AN - SCOPUS:85216679053
T3 - Frontiers in Artificial Intelligence and Applications
SP - 4369
EP - 4376
BT - ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
A2 - Endriss, Ulle
A2 - Melo, Francisco S.
A2 - Bach, Kerstin
A2 - Bugarin-Diz, Alberto
A2 - Alonso-Moral, Jose M.
A2 - Barro, Senen
A2 - Heintz, Fredrik
PB - IOS Press BV
Y2 - 19 October 2024 through 24 October 2024
ER -