Imitation Learning Method of Multi-quality Expert Data Based on GAIL

Dengmin Xiao, Bo Wang, Zhongqi Sun, Xiao He

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

This paper focuses on the imitation learning methods of multi-quality expert data based on Generative Adversarial Imitation Learning (GAIL). The agent is able to acquire high-quality behavioral policies through GAIL by imitating actions from experts and learning the experience distribution instead of the reward function. Considering that multi-quality expert imitation learning can achieve the effect of data augmentation, a novel GAIL-based method named MT-GAIL is proposed for imitation learning. We first define the reliability coefficient of different expert data by calculating the accuracy of corresponding discriminator. Then the reliability coefficient is used as the weight to calculate the reward function that is defined as the sum of the products of the weights and the output of corresponding discriminator. The series of rewards, states and actions are finally fed into the experience pool to train the network of policy builder. We compare the GAIL method through experiments for the cases of single-expert and multi-quality expert trajectories, which shows that the proposed MT-GAIL method is capable of avoiding the worst expert data. The effects of different reward value calculation methods on multi-quality expert data are also conducted to illustrate the distinct advantage of our proposed discriminator output value weighting method.

源语言英语
主期刊名Proceedings - 2023 China Automation Congress, CAC 2023
出版商Institute of Electrical and Electronics Engineers Inc.
8642-8647
页数6
ISBN(电子版)9798350303759
DOI
出版状态已出版 - 2023
活动2023 China Automation Congress, CAC 2023 - Chongqing, 中国
期限: 17 11月 202319 11月 2023

出版系列

姓名Proceedings - 2023 China Automation Congress, CAC 2023

会议

会议2023 China Automation Congress, CAC 2023
国家/地区中国
Chongqing
时期17/11/2319/11/23

指纹

探究 'Imitation Learning Method of Multi-quality Expert Data Based on GAIL' 的科研主题。它们共同构成独一无二的指纹。

引用此