Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Wenxuan Ma; Shuang Li; Jin Ming Zhang; Chi Harold Liu; Jingxuan Kang; Yulin Wang; Gao Huang

doi:10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

Wenxuan Ma, Shuang Li^*, Jin Ming Zhang, Chi Harold Liu, Jingxuan Kang, Yulin Wang, Gao Huang

^*此作品的通讯作者

计算机学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

2 引用（Scopus）

摘要

The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

源语言	英语
主期刊名	Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
出版商	Institute of Electrical and Electronics Engineers Inc.
页	18740-18751
页数	12
ISBN（电子版）	9798350307184
DOI	https://doi.org/10.1109/ICCV51070.2023.01722
出版状态	已出版 - 2023
活动	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, 法国期限: 2 10月 2023 → 6 10月 2023

出版系列

姓名	Proceedings of the IEEE International Conference on Computer Vision
ISSN（印刷版）	1550-5499

会议

会议	2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
国家/地区	法国
市	Paris
时期	2/10/23 → 6/10/23

访问文件

10.1109/ICCV51070.2023.01722

其它文件与链接

链接到 Scopus 的出版物

引用此

Ma, W., Li, S., Zhang, J. M., Liu, C. H., Kang, J., Wang, Y., & Huang, G. (2023). Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 (页码 18740-18751). (Proceedings of the IEEE International Conference on Computer Vision). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCV51070.2023.01722

Ma, Wenxuan ; Li, Shuang ; Zhang, Jin Ming 等. / Borrowing Knowledge From Pre-trained Language Model : A New Data-efficient Visual Learning Paradigm. Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 18740-18751 (Proceedings of the IEEE International Conference on Computer Vision).

@inproceedings{2730d2c95a81460c9393176cbb7ee02e,

title = "Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm",

abstract = "The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.",

author = "Wenxuan Ma and Shuang Li and Zhang, {Jin Ming} and Liu, {Chi Harold} and Jingxuan Kang and Yulin Wang and Gao Huang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCV51070.2023.01722",

language = "English",

series = "Proceedings of the IEEE International Conference on Computer Vision",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "18740--18751",

booktitle = "Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023",

address = "United States",

}

Ma, W, Li, S, Zhang, JM, Liu, CH, Kang, J, Wang, Y & Huang, G 2023, Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Proceedings of the IEEE International Conference on Computer Vision, Institute of Electrical and Electronics Engineers Inc., 页码 18740-18751, 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, 法国, 2/10/23. https://doi.org/10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. / Ma, Wenxuan; Li, Shuang; Zhang, Jin Ming 等.
Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc., 2023. 页码 18740-18751 (Proceedings of the IEEE International Conference on Computer Vision).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Borrowing Knowledge From Pre-trained Language Model

T2 - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

AU - Ma, Wenxuan

AU - Li, Shuang

AU - Zhang, Jin Ming

AU - Liu, Chi Harold

AU - Kang, Jingxuan

AU - Wang, Yulin

AU - Huang, Gao

PY - 2023

Y1 - 2023

N2 - The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

AB - The development of vision models for real-world applications is hindered by the challenge of annotated data scarcity, which has necessitated the adoption of dataefficient visual learning techniques such as semi-supervised learning. Unfortunately, the prevalent cross-entropy supervision is limited by its focus on category discrimination while disregarding the semantic connection between concepts, which ultimately results in the suboptimal exploitation of scarce labeled data. To address this issue, this paper presents a novel approach that seeks to leverage linguistic knowledge for data-efficient visual learning. The proposed approach, BorLan, Borrows knowledge from off-theshelf pretrained Language models that are already endowed with rich semantics extracted from large corpora, to compensate the semantic deficiency due to limited annotation in visual training. Specifically, we design a distribution alignment objective, which guides the vision model to learn both semantic-aware and domain-agnostic representations for the task through linguistic knowledge. One significant advantage of this paradigm is its flexibility in combining various visual and linguistic models. Extensive experiments on semi-supervised learning, single domain generalization and few-shot learning validate its effectiveness. Code is available at https://github.com/BIT-DA/BorLan.

UR - http://www.scopus.com/inward/record.url?scp=85183405622&partnerID=8YFLogxK

U2 - 10.1109/ICCV51070.2023.01722

DO - 10.1109/ICCV51070.2023.01722

M3 - Conference contribution

AN - SCOPUS:85183405622

T3 - Proceedings of the IEEE International Conference on Computer Vision

SP - 18740

EP - 18751

BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 2 October 2023 through 6 October 2023

ER -

Ma W, Li S, Zhang JM, Liu CH, Kang J, Wang Y 等. Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm. 在 Proceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023. Institute of Electrical and Electronics Engineers Inc. 2023. 页码 18740-18751. (Proceedings of the IEEE International Conference on Computer Vision). doi: 10.1109/ICCV51070.2023.01722

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此