Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation

Xiaodong Guo; Wujie Zhou; Tong Liu

doi:10.1016/j.knosys.2024.111588

Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation

Xiaodong Guo, Wujie Zhou^*, Tong Liu

^*Corresponding author for this work

School of Automation

Research output: Contribution to journal › Article › peer-review

6 Citations (Scopus)

Abstract

RGB thermal semantic segmentation facilitates unmanned platforms to perceive and characterize their surrounding environment, which is critical for autonomous driving tasks. Deep-learning-based algorithms have achieved dominance in terms of accuracy and robustness. However, their large parameter sizes and significant computational demands impede their application in terminal devices. To address this challenge, we propose a novel strategy for achieving a balance between effectiveness and compactness. It includes a robust teacher network, CLNet-T, and a streamlined student network, CLNet-S. Using knowledge distillation (KD), we obtained an optimized model called CLNet-S*. Specifically, CLNet-T and CLNet-S were identical in all aspects except for the feature extraction component. They included a multi-attribute hierarchical feature interaction module (MHFI) and a detail-guided semantic decoder (DGSD). The MHFI initially filters features by considering the characteristics of the low- and high-level features. It gradually combines complementary and common features from various modalities in distinct receptive fields. DGSD uses edge and distribution information to guide semantic decoding, thereby improving the segmentation accuracy at class boundaries. To enhance the performance of the compact student model, our KD strategy includes detail, semantic response distillation (DSRD), and contrastive learning-based feature distillation (CLFD). Practically, DSRD enables the student model to gain knowledge from the teacher model at both the detailed and semantic levels. At the same time, CLFD increases the similarity of features within the same categories and emphasizes the distinctiveness of features between different categories in both the student and teacher models. Extensive experiments conducted on two standard datasets have consistently demonstrated that both CLNet-T and CLNet-S* outperform other state-of-the-art methods. The code and results are available at https://github.com/xiaodonguo/CLNet.

Original language	English
Article number	111588
Journal	Knowledge-Based Systems
Volume	292
DOIs	https://doi.org/10.1016/j.knosys.2024.111588
Publication status	Published - 23 May 2024

Keywords

Autonomous driving
Contrastive learning
Knowledge distillation
RGB-Thermal semantic segmentation
Urban scene

Access to Document

10.1016/j.knosys.2024.111588

Cite this

Guo, X., Zhou, W., & Liu, T. (2024). Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation. Knowledge-Based Systems, 292, Article 111588. https://doi.org/10.1016/j.knosys.2024.111588

@article{d5b0af16c89a43c483a555df510f042e,

title = "Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation",

abstract = "RGB thermal semantic segmentation facilitates unmanned platforms to perceive and characterize their surrounding environment, which is critical for autonomous driving tasks. Deep-learning-based algorithms have achieved dominance in terms of accuracy and robustness. However, their large parameter sizes and significant computational demands impede their application in terminal devices. To address this challenge, we propose a novel strategy for achieving a balance between effectiveness and compactness. It includes a robust teacher network, CLNet-T, and a streamlined student network, CLNet-S. Using knowledge distillation (KD), we obtained an optimized model called CLNet-S*. Specifically, CLNet-T and CLNet-S were identical in all aspects except for the feature extraction component. They included a multi-attribute hierarchical feature interaction module (MHFI) and a detail-guided semantic decoder (DGSD). The MHFI initially filters features by considering the characteristics of the low- and high-level features. It gradually combines complementary and common features from various modalities in distinct receptive fields. DGSD uses edge and distribution information to guide semantic decoding, thereby improving the segmentation accuracy at class boundaries. To enhance the performance of the compact student model, our KD strategy includes detail, semantic response distillation (DSRD), and contrastive learning-based feature distillation (CLFD). Practically, DSRD enables the student model to gain knowledge from the teacher model at both the detailed and semantic levels. At the same time, CLFD increases the similarity of features within the same categories and emphasizes the distinctiveness of features between different categories in both the student and teacher models. Extensive experiments conducted on two standard datasets have consistently demonstrated that both CLNet-T and CLNet-S* outperform other state-of-the-art methods. The code and results are available at https://github.com/xiaodonguo/CLNet.",

keywords = "Autonomous driving, Contrastive learning, Knowledge distillation, RGB-Thermal semantic segmentation, Urban scene",

author = "Xiaodong Guo and Wujie Zhou and Tong Liu",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2024",

month = may,

day = "23",

doi = "10.1016/j.knosys.2024.111588",

language = "English",

volume = "292",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation

AU - Guo, Xiaodong

AU - Zhou, Wujie

AU - Liu, Tong

PY - 2024/5/23

Y1 - 2024/5/23

N2 - RGB thermal semantic segmentation facilitates unmanned platforms to perceive and characterize their surrounding environment, which is critical for autonomous driving tasks. Deep-learning-based algorithms have achieved dominance in terms of accuracy and robustness. However, their large parameter sizes and significant computational demands impede their application in terminal devices. To address this challenge, we propose a novel strategy for achieving a balance between effectiveness and compactness. It includes a robust teacher network, CLNet-T, and a streamlined student network, CLNet-S. Using knowledge distillation (KD), we obtained an optimized model called CLNet-S*. Specifically, CLNet-T and CLNet-S were identical in all aspects except for the feature extraction component. They included a multi-attribute hierarchical feature interaction module (MHFI) and a detail-guided semantic decoder (DGSD). The MHFI initially filters features by considering the characteristics of the low- and high-level features. It gradually combines complementary and common features from various modalities in distinct receptive fields. DGSD uses edge and distribution information to guide semantic decoding, thereby improving the segmentation accuracy at class boundaries. To enhance the performance of the compact student model, our KD strategy includes detail, semantic response distillation (DSRD), and contrastive learning-based feature distillation (CLFD). Practically, DSRD enables the student model to gain knowledge from the teacher model at both the detailed and semantic levels. At the same time, CLFD increases the similarity of features within the same categories and emphasizes the distinctiveness of features between different categories in both the student and teacher models. Extensive experiments conducted on two standard datasets have consistently demonstrated that both CLNet-T and CLNet-S* outperform other state-of-the-art methods. The code and results are available at https://github.com/xiaodonguo/CLNet.

AB - RGB thermal semantic segmentation facilitates unmanned platforms to perceive and characterize their surrounding environment, which is critical for autonomous driving tasks. Deep-learning-based algorithms have achieved dominance in terms of accuracy and robustness. However, their large parameter sizes and significant computational demands impede their application in terminal devices. To address this challenge, we propose a novel strategy for achieving a balance between effectiveness and compactness. It includes a robust teacher network, CLNet-T, and a streamlined student network, CLNet-S. Using knowledge distillation (KD), we obtained an optimized model called CLNet-S*. Specifically, CLNet-T and CLNet-S were identical in all aspects except for the feature extraction component. They included a multi-attribute hierarchical feature interaction module (MHFI) and a detail-guided semantic decoder (DGSD). The MHFI initially filters features by considering the characteristics of the low- and high-level features. It gradually combines complementary and common features from various modalities in distinct receptive fields. DGSD uses edge and distribution information to guide semantic decoding, thereby improving the segmentation accuracy at class boundaries. To enhance the performance of the compact student model, our KD strategy includes detail, semantic response distillation (DSRD), and contrastive learning-based feature distillation (CLFD). Practically, DSRD enables the student model to gain knowledge from the teacher model at both the detailed and semantic levels. At the same time, CLFD increases the similarity of features within the same categories and emphasizes the distinctiveness of features between different categories in both the student and teacher models. Extensive experiments conducted on two standard datasets have consistently demonstrated that both CLNet-T and CLNet-S* outperform other state-of-the-art methods. The code and results are available at https://github.com/xiaodonguo/CLNet.

KW - Autonomous driving

KW - Contrastive learning

KW - Knowledge distillation

KW - RGB-Thermal semantic segmentation

KW - Urban scene

UR - http://www.scopus.com/inward/record.url?scp=85188845184&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2024.111588

DO - 10.1016/j.knosys.2024.111588

M3 - Article

AN - SCOPUS:85188845184

SN - 0950-7051

VL - 292

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

M1 - 111588

ER -

Contrastive learning-based knowledge distillation for RGB-thermal urban scene semantic segmentation

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this