Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context

Min Yan; Junzheng Wang; Jing Li; Ke Zhang; Zimu Yang

doi:10.1016/j.neucom.2019.12.007

Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context

Min Yan, Junzheng Wang, Jing Li^*, Ke Zhang, Zimu Yang

^*此作品的通讯作者

自动化学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

25 引用（Scopus）

摘要

Context information plays an important role in semantic segmentation of urban traffic scenes, which is one of the key tasks of the intelligent platform's (such as unmanned vehicles) perceiving environment, and has inspired a wide range of interests from researchers. This paper synthesizes three considerations: feature space correlation, information distributed in the long distance of image plane and long distance sequence information, and proposes a combination of self-attention mechanism and bi-directional gated recurrent unit (GRU) neural network to extract various contextual information on the basis of deep feature network, so as to achieve better semantic segmentation performance. In order to explore the optimal implementation, two kinds of topological connections are attempted. One is self-attention branch and bi-directional GRU branch in series, and the other is in parallel. In addition, in order to train the network better and achieve more precise segmentation results, a cascade refinement supervised method using two losses is proposed. Experiments carried out on Cityscapes, Mapillary, CamVid and KITTI semantic segmentation datasets demonstrate the outstanding performance and robust generalization ability of our method.

源语言	英语
页（从-至）	293-304
页数	12
期刊	Neurocomputing
卷	386
DOI	https://doi.org/10.1016/j.neucom.2019.12.007
出版状态	已出版 - 21 4月 2020

访问文件

10.1016/j.neucom.2019.12.007

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{2a1e05f2262147cc8854342d7a027378,

title = "Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context",

abstract = "Context information plays an important role in semantic segmentation of urban traffic scenes, which is one of the key tasks of the intelligent platform's (such as unmanned vehicles) perceiving environment, and has inspired a wide range of interests from researchers. This paper synthesizes three considerations: feature space correlation, information distributed in the long distance of image plane and long distance sequence information, and proposes a combination of self-attention mechanism and bi-directional gated recurrent unit (GRU) neural network to extract various contextual information on the basis of deep feature network, so as to achieve better semantic segmentation performance. In order to explore the optimal implementation, two kinds of topological connections are attempted. One is self-attention branch and bi-directional GRU branch in series, and the other is in parallel. In addition, in order to train the network better and achieve more precise segmentation results, a cascade refinement supervised method using two losses is proposed. Experiments carried out on Cityscapes, Mapillary, CamVid and KITTI semantic segmentation datasets demonstrate the outstanding performance and robust generalization ability of our method.",

keywords = "Context, Gated recurrent unit, Self-attention, Semantic segmentation",

author = "Min Yan and Junzheng Wang and Jing Li and Ke Zhang and Zimu Yang",

note = "Publisher Copyright: {\textcopyright} 2019 Elsevier B.V.",

year = "2020",

month = apr,

day = "21",

doi = "10.1016/j.neucom.2019.12.007",

language = "English",

volume = "386",

pages = "293--304",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context

AU - Yan, Min

AU - Wang, Junzheng

AU - Li, Jing

AU - Zhang, Ke

AU - Yang, Zimu

PY - 2020/4/21

Y1 - 2020/4/21

N2 - Context information plays an important role in semantic segmentation of urban traffic scenes, which is one of the key tasks of the intelligent platform's (such as unmanned vehicles) perceiving environment, and has inspired a wide range of interests from researchers. This paper synthesizes three considerations: feature space correlation, information distributed in the long distance of image plane and long distance sequence information, and proposes a combination of self-attention mechanism and bi-directional gated recurrent unit (GRU) neural network to extract various contextual information on the basis of deep feature network, so as to achieve better semantic segmentation performance. In order to explore the optimal implementation, two kinds of topological connections are attempted. One is self-attention branch and bi-directional GRU branch in series, and the other is in parallel. In addition, in order to train the network better and achieve more precise segmentation results, a cascade refinement supervised method using two losses is proposed. Experiments carried out on Cityscapes, Mapillary, CamVid and KITTI semantic segmentation datasets demonstrate the outstanding performance and robust generalization ability of our method.

AB - Context information plays an important role in semantic segmentation of urban traffic scenes, which is one of the key tasks of the intelligent platform's (such as unmanned vehicles) perceiving environment, and has inspired a wide range of interests from researchers. This paper synthesizes three considerations: feature space correlation, information distributed in the long distance of image plane and long distance sequence information, and proposes a combination of self-attention mechanism and bi-directional gated recurrent unit (GRU) neural network to extract various contextual information on the basis of deep feature network, so as to achieve better semantic segmentation performance. In order to explore the optimal implementation, two kinds of topological connections are attempted. One is self-attention branch and bi-directional GRU branch in series, and the other is in parallel. In addition, in order to train the network better and achieve more precise segmentation results, a cascade refinement supervised method using two losses is proposed. Experiments carried out on Cityscapes, Mapillary, CamVid and KITTI semantic segmentation datasets demonstrate the outstanding performance and robust generalization ability of our method.

KW - Context

KW - Gated recurrent unit

KW - Self-attention

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85077715033&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2019.12.007

DO - 10.1016/j.neucom.2019.12.007

M3 - Article

AN - SCOPUS:85077715033

SN - 0925-2312

VL - 386

SP - 293

EP - 304

JO - Neurocomputing

JF - Neurocomputing

ER -

Traffic scene semantic segmentation using self-attention mechanism and bi-directional GRU to correlate context

摘要

访问文件

其它文件与链接

指纹

引用此