Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Wenxuan Ma; Shuang Li; Jin Ming Zhang; Chi Harold Liu; Jingxuan Kang; Yulin Wang; Gao Huang

doi:10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Wenxuan Ma, Shuang Li^*, Jin Ming Zhang, Chi Harold Liu, Jingxuan Kang, Yulin Wang, Gao Huang

^*Corresponding author for this work

School of Computer Science and Technology

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

3 Citations (Scopus)

Abstract

The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

Original language	English
Title of host publication	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	18740-18751
Number of pages	12
ISBN (Electronic)	9798350307184
DOIs	https://doi.org/10.1109/ICCV51070.2023.01722
Publication status	Published - 2023
Event	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023

Publication series

Name	Proceedings of the IEEE International Conference on Computer Vision
ISSN (Print)	1550-5499

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Country/Territory	France
City	Paris
Period	2/10/23 → 6/10/23

Access to Document

10.1109/ICCV51070.2023.01722

Cite this

Ma, W., Li, S., Zhang, J. M., Liu, C. H., Kang, J., Wang, Y., & Huang, G. (2023). Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 (pp. 18740-18751). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV51070.2023.01722

Ma, Wenxuan ; Li, Shuang ; Zhang, Jin Ming et al. / Borrowing Knowledge From Pre-trained Language Model : A New Data-efficient Visual Learning Paradigm. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. pp. 18740-18751 (Proceedings of the IEEE International Conference on Computer Vision).

@inproceedings{2730d2c95a81460c9393176cbb7ee02e,

title = "Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm",

abstract = "The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.",

author = "Wenxuan Ma and Shuang Li and Zhang, {Jin Ming} and Liu, {Chi Harold} and Jingxuan Kang and Yulin Wang and Gao Huang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCV51070.2023.01722",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "18740--18751",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023",

address = "United States",

}

Ma, W, Li, S, Zhang, JM, Liu, CH, Kang, J, Wang, Y & Huang, G 2023, Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. in Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., pp. 18740-18751, 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 2/10/23. https://doi.org/10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. / Ma, Wenxuan; Li, Shuang; Zhang, Jin Ming et al.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. p. 18740-18751 (Proceedings of the IEEE International Conference on Computer Vision).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

TY - GEN

T1 - Borrowing Knowledge From Pre-trained Language Model

T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

AU - Ma, Wenxuan

AU - Li, Shuang

AU - Zhang, Jin Ming

AU - Liu, Chi Harold

AU - Kang, Jingxuan

AU - Wang, Yulin

AU - Huang, Gao

PY - 2023

Y1 - 2023

N2 - The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

AB - The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

UR - http://www.scopus.com/inward/record.url?scp=85183405622&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.01722

DO - 10.1109/ICCV51070.2023.01722

M3 - Conference contribution

AN - SCOPUS:85183405622

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 18740

EP - 18751

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 2 October 2023 through 6 October 2023

ER -

Ma W, Li S, Zhang JM, Liu CH, Kang J, Wang Y et al. Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. In Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc. 2023. p. 18740-18751. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Cite this