Beyond Imitation: A Life-Long Policy Learning Framework for Path Tracking Control of Autonomous Driving

Cheng Gong; Chao Lu; Zirui Li; Zhe Liu; Jianwei Gong; Xuemei Chen

doi:10.1109/TVT.2024.3382309

Beyond Imitation: A Life-Long Policy Learning Framework for Path Tracking Control of Autonomous Driving

Cheng Gong, Chao Lu^*, Zirui Li, Zhe Liu, Jianwei Gong^*, Xuemei Chen

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.

Original language	English
Pages (from-to)	9786-9799
Number of pages	14
Journal	IEEE Transactions on Vehicular Technology
Volume	73
Issue number	7
DOIs	https://doi.org/10.1109/TVT.2024.3382309
Publication status	Published - 2024

Keywords

Autonomous driving
learning from demonstration
life-long learning
model-free control
path tracking

Access to Document

10.1109/TVT.2024.3382309

Cite this

@article{b174f551c50442b4bec771554be8ac8f,

title = "Beyond Imitation: A Life-Long Policy Learning Framework for Path Tracking Control of Autonomous Driving",

abstract = "Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.",

keywords = "Autonomous driving, learning from demonstration, life-long learning, model-free control, path tracking",

author = "Cheng Gong and Chao Lu and Zirui Li and Zhe Liu and Jianwei Gong and Xuemei Chen",

note = "Publisher Copyright: {\textcopyright} 1967-2012 IEEE.",

year = "2024",

doi = "10.1109/TVT.2024.3382309",

language = "English",

volume = "73",

pages = "9786--9799",

journal = "IEEE Transactions on Vehicular Technology",

issn = "0018-9545",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "7",

}

TY - JOUR

T1 - Beyond Imitation

T2 - A Life-Long Policy Learning Framework for Path Tracking Control of Autonomous Driving

AU - Gong, Cheng

AU - Lu, Chao

AU - Li, Zirui

AU - Liu, Zhe

AU - Gong, Jianwei

AU - Chen, Xuemei

PY - 2024

Y1 - 2024

N2 - Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.

AB - Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the data sufficiency and quality of the demonstrations. To alleviate the above problems of IL-based policies, a lifelong policy learning (LLPL) framework is proposed in this paper, which extends the IL scheme with lifelong learning (LLL). First, a novel IL-based model-free control policy learning method for path tracking is introduced. Even with imperfect demonstration, the optimal control policy can be learned directly from historical driving data. Second, by using the LLL method, the pre-trained IL policy can be safely updated and fine-tuned with incremental execution knowledge. Third, a knowledge evaluation method for policy learning is introduced to avoid learning redundant or inferior knowledge, thus ensuring the performance improvement of online policy learning. Experiments are conducted using a high-fidelity vehicle dynamic model in various scenarios to evaluate the performance of the proposed method. The results show that the proposed LLPL framework can continuously improve the policy performance with collected incremental driving data, and achieves the best accuracy and control smoothness compared to other baseline methods after evolving on a 7 km curved road. Through learning and evaluation with noisy real-life data collected in an off-road environment, the proposed LLPL framework also demonstrates its applicability in learning and evolving in real-life scenarios.

KW - Autonomous driving

KW - learning from demonstration

KW - life-long learning

KW - model-free control

KW - path tracking

UR - http://www.scopus.com/inward/record.url?scp=85190173760&partnerID=8YFLogxK

U2 - 10.1109/TVT.2024.3382309

DO - 10.1109/TVT.2024.3382309

M3 - Article

AN - SCOPUS:85190173760

SN - 0018-9545

VL - 73

SP - 9786

EP - 9799

JO - IEEE Transactions on Vehicular Technology

JF - IEEE Transactions on Vehicular Technology

IS - 7

ER -

Beyond Imitation: A Life-Long Policy Learning Framework for Path Tracking Control of Autonomous Driving

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this