Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles

Shun Yang; Wenshuo Wang; Chang Liu; Weiwen Deng

doi:10.1109/TSMC.2018.2868372

Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles

Shun Yang, Wenshuo Wang, Chang Liu, Weiwen Deng^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

75 Citations (Scopus)

Abstract

Deep learning techniques have been widely used in autonomous driving community for the purpose of environment perception. Recently, it starts being adopted for learning end-to-end controllers for complex driving scenarios. However, the complexity and nonlinearity of the network architecture limits its interpretability to understand driving scenarios and judge the importance of certain visual regions in sensory scenes. In this paper, based on the convolutional neural network (CNN), we propose two complementary frameworks to automatically determine the most contributive regions of the input scenes, offering intuitive knowledge of how a trained end-to-end autonomous vehicle controller understands driving scenarios. In the first framework, a feature map-based method is proposed by leveraging current progress in CNN visualization, in which the deconvolution approach recovers the feature maps to extract features that contribute most to understand driving scenes. In the second framework, the importance level of regions is ranked using the error map between the labeled and predicted control inputs generated by occluding different parts of input scenes, thus providing a pixel-wise rank of importance. Test data sets with extracted contributive regions are input to the CNN controller. Then, different CNN controllers trained with the new data sets preprocessed using our proposed frameworks are verified via closed-loop tests. Results show that both the features identified from the first framework and the regions identified from the second framework are of crucial importance to scene understanding for the controller and can significantly affect the performance of CNN controllers.

Original language	English
Article number	8480450
Pages (from-to)	53-63
Number of pages	11
Journal	IEEE Transactions on Systems, Man, and Cybernetics: Systems
Volume	49
Issue number	1
DOIs	https://doi.org/10.1109/TSMC.2018.2868372
Publication status	Published - Jan 2019
Externally published	Yes

Keywords

Autonomous vehicles
convolutional neural network (CNN)
scene understanding

Access to Document

10.1109/TSMC.2018.2868372

Cite this

@article{04673f6bfff449ee9d94be2234692568,

title = "Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles",

abstract = "Deep learning techniques have been widely used in autonomous driving community for the purpose of environment perception. Recently, it starts being adopted for learning end-to-end controllers for complex driving scenarios. However, the complexity and nonlinearity of the network architecture limits its interpretability to understand driving scenarios and judge the importance of certain visual regions in sensory scenes. In this paper, based on the convolutional neural network (CNN), we propose two complementary frameworks to automatically determine the most contributive regions of the input scenes, offering intuitive knowledge of how a trained end-to-end autonomous vehicle controller understands driving scenarios. In the first framework, a feature map-based method is proposed by leveraging current progress in CNN visualization, in which the deconvolution approach recovers the feature maps to extract features that contribute most to understand driving scenes. In the second framework, the importance level of regions is ranked using the error map between the labeled and predicted control inputs generated by occluding different parts of input scenes, thus providing a pixel-wise rank of importance. Test data sets with extracted contributive regions are input to the CNN controller. Then, different CNN controllers trained with the new data sets preprocessed using our proposed frameworks are verified via closed-loop tests. Results show that both the features identified from the first framework and the regions identified from the second framework are of crucial importance to scene understanding for the controller and can significantly affect the performance of CNN controllers.",

keywords = "Autonomous vehicles, convolutional neural network (CNN), scene understanding",

author = "Shun Yang and Wenshuo Wang and Chang Liu and Weiwen Deng",

note = "Publisher Copyright: {\textcopyright} 2013 IEEE.",

year = "2019",

month = jan,

doi = "10.1109/TSMC.2018.2868372",

language = "English",

volume = "49",

pages = "53--63",

journal = "IEEE Transactions on Systems, Man, and Cybernetics: Systems",

issn = "2168-2216",

publisher = "IEEE Advancing Technology for Humanity",

number = "1",

}

TY - JOUR

T1 - Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles

AU - Yang, Shun

AU - Wang, Wenshuo

AU - Liu, Chang

AU - Deng, Weiwen

PY - 2019/1

Y1 - 2019/1

N2 - Deep learning techniques have been widely used in autonomous driving community for the purpose of environment perception. Recently, it starts being adopted for learning end-to-end controllers for complex driving scenarios. However, the complexity and nonlinearity of the network architecture limits its interpretability to understand driving scenarios and judge the importance of certain visual regions in sensory scenes. In this paper, based on the convolutional neural network (CNN), we propose two complementary frameworks to automatically determine the most contributive regions of the input scenes, offering intuitive knowledge of how a trained end-to-end autonomous vehicle controller understands driving scenarios. In the first framework, a feature map-based method is proposed by leveraging current progress in CNN visualization, in which the deconvolution approach recovers the feature maps to extract features that contribute most to understand driving scenes. In the second framework, the importance level of regions is ranked using the error map between the labeled and predicted control inputs generated by occluding different parts of input scenes, thus providing a pixel-wise rank of importance. Test data sets with extracted contributive regions are input to the CNN controller. Then, different CNN controllers trained with the new data sets preprocessed using our proposed frameworks are verified via closed-loop tests. Results show that both the features identified from the first framework and the regions identified from the second framework are of crucial importance to scene understanding for the controller and can significantly affect the performance of CNN controllers.

AB - Deep learning techniques have been widely used in autonomous driving community for the purpose of environment perception. Recently, it starts being adopted for learning end-to-end controllers for complex driving scenarios. However, the complexity and nonlinearity of the network architecture limits its interpretability to understand driving scenarios and judge the importance of certain visual regions in sensory scenes. In this paper, based on the convolutional neural network (CNN), we propose two complementary frameworks to automatically determine the most contributive regions of the input scenes, offering intuitive knowledge of how a trained end-to-end autonomous vehicle controller understands driving scenarios. In the first framework, a feature map-based method is proposed by leveraging current progress in CNN visualization, in which the deconvolution approach recovers the feature maps to extract features that contribute most to understand driving scenes. In the second framework, the importance level of regions is ranked using the error map between the labeled and predicted control inputs generated by occluding different parts of input scenes, thus providing a pixel-wise rank of importance. Test data sets with extracted contributive regions are input to the CNN controller. Then, different CNN controllers trained with the new data sets preprocessed using our proposed frameworks are verified via closed-loop tests. Results show that both the features identified from the first framework and the regions identified from the second framework are of crucial importance to scene understanding for the controller and can significantly affect the performance of CNN controllers.

KW - Autonomous vehicles

KW - convolutional neural network (CNN)

KW - scene understanding

UR - http://www.scopus.com/inward/record.url?scp=85054552728&partnerID=8YFLogxK

U2 - 10.1109/TSMC.2018.2868372

DO - 10.1109/TSMC.2018.2868372

M3 - Article

AN - SCOPUS:85054552728

SN - 2168-2216

VL - 49

SP - 53

EP - 63

JO - IEEE Transactions on Systems, Man, and Cybernetics: Systems

JF - IEEE Transactions on Systems, Man, and Cybernetics: Systems

IS - 1

M1 - 8480450

ER -

Scene understanding in deep learning-based end-to-end controllers for autonomous vehicles

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this