Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification

Yuxiang Zhang; Mengmeng Zhang; Wei Li; Ran Tao

doi:10.1109/ICASSP49357.2023.10095723

Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification

Yuxiang Zhang, Mengmeng Zhang^*, Wei Li, Ran Tao

^*此作品的通讯作者

信息与电子学院

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

1 引用（Scopus）

摘要

The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have never focused on mining linguistic modal knowledge to enhance model generalization performance. Additionally, text information has been ignored in hyperspectral image classification (HSI) tasks. To address the aforementioned shortcomings, a Multi-modal Domain Generalization Network (MDG) is proposed to learn cross-domain invariant representation from cross-domain shared semantic space. Only the source domain (SD) is used for training in the proposed method, after which the model is directly transferred to the target domain (TD). Visual and linguistic features are extracted using the dual-stream architecture, which consists of an image encoder and a text encoder. A generator is designed to obtain extended domain (ED) samples that are different from SD. Furthermore, linguistic features are used to construct a cross-domain shared semantic space, where visual-linguistic alignment is accomplished by supervised contrastive learning. Extensive experiments on two datasets show that the proposed method outperforms state-of-the-art approaches.

源语言	英语
主期刊名	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
出版商	Institute of Electrical and Electronics Engineers Inc.
ISBN（电子版）	9781728163277
DOI	https://doi.org/10.1109/ICASSP49357.2023.10095723
出版状态	已出版 - 2023
活动	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, 希腊期限: 4 6月 2023 → 10 6月 2023

出版系列

姓名	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
卷	2023-June
ISSN（印刷版）	1520-6149

会议

会议	48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
国家/地区	希腊
市	Rhodes Island
时期	4/06/23 → 10/06/23

访问文件

10.1109/ICASSP49357.2023.10095723

其它文件与链接

链接到 Scopus 的出版物

引用此

Zhang, Y., Zhang, M., Li, W., & Tao, R. (2023). Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification. 在 ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2023-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP49357.2023.10095723

Zhang, Yuxiang ; Zhang, Mengmeng ; Li, Wei 等. / Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

@inproceedings{f0d45c7af67f4ec390c05e9f1fe460f0,

title = "Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification",

abstract = "The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have never focused on mining linguistic modal knowledge to enhance model generalization performance. Additionally, text information has been ignored in hyperspectral image classification (HSI) tasks. To address the aforementioned shortcomings, a Multi-modal Domain Generalization Network (MDG) is proposed to learn cross-domain invariant representation from cross-domain shared semantic space. Only the source domain (SD) is used for training in the proposed method, after which the model is directly transferred to the target domain (TD). Visual and linguistic features are extracted using the dual-stream architecture, which consists of an image encoder and a text encoder. A generator is designed to obtain extended domain (ED) samples that are different from SD. Furthermore, linguistic features are used to construct a cross-domain shared semantic space, where visual-linguistic alignment is accomplished by supervised contrastive learning. Extensive experiments on two datasets show that the proposed method outperforms state-of-the-art approaches.",

keywords = "Contrastive Learning, Cross-Scene, Domain Generalization, Hyperspectral Image Classification, Multiple-modality, Natural Language Supervision",

author = "Yuxiang Zhang and Mengmeng Zhang and Wei Li and Ran Tao",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 ; Conference date: 04-06-2023 Through 10-06-2023",

year = "2023",

doi = "10.1109/ICASSP49357.2023.10095723",

language = "English",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings",

address = "United States",

}

Zhang, Y, Zhang, M, Li, W & Tao, R 2023, Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification. 在 ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 卷 2023-June, Institute of Electrical and Electronics Engineers Inc., 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023, Rhodes Island, 希腊, 4/06/23. https://doi.org/10.1109/ICASSP49357.2023.10095723

Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification. / Zhang, Yuxiang; Zhang, Mengmeng; Li, Wei 等.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc., 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; 卷 2023-June).

科研成果: 书/报告/会议事项章节 › 会议稿件 › 同行评审

TY - GEN

T1 - Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification

AU - Zhang, Yuxiang

AU - Zhang, Mengmeng

AU - Li, Wei

AU - Tao, Ran

PY - 2023

Y1 - 2023

N2 - The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have never focused on mining linguistic modal knowledge to enhance model generalization performance. Additionally, text information has been ignored in hyperspectral image classification (HSI) tasks. To address the aforementioned shortcomings, a Multi-modal Domain Generalization Network (MDG) is proposed to learn cross-domain invariant representation from cross-domain shared semantic space. Only the source domain (SD) is used for training in the proposed method, after which the model is directly transferred to the target domain (TD). Visual and linguistic features are extracted using the dual-stream architecture, which consists of an image encoder and a text encoder. A generator is designed to obtain extended domain (ED) samples that are different from SD. Furthermore, linguistic features are used to construct a cross-domain shared semantic space, where visual-linguistic alignment is accomplished by supervised contrastive learning. Extensive experiments on two datasets show that the proposed method outperforms state-of-the-art approaches.

AB - The large-scale pre-training image-text foundation models have excelled in a number of downstream applications. The majority of domain generalization techniques, however, have never focused on mining linguistic modal knowledge to enhance model generalization performance. Additionally, text information has been ignored in hyperspectral image classification (HSI) tasks. To address the aforementioned shortcomings, a Multi-modal Domain Generalization Network (MDG) is proposed to learn cross-domain invariant representation from cross-domain shared semantic space. Only the source domain (SD) is used for training in the proposed method, after which the model is directly transferred to the target domain (TD). Visual and linguistic features are extracted using the dual-stream architecture, which consists of an image encoder and a text encoder. A generator is designed to obtain extended domain (ED) samples that are different from SD. Furthermore, linguistic features are used to construct a cross-domain shared semantic space, where visual-linguistic alignment is accomplished by supervised contrastive learning. Extensive experiments on two datasets show that the proposed method outperforms state-of-the-art approaches.

KW - Contrastive Learning

KW - Cross-Scene

KW - Domain Generalization

KW - Hyperspectral Image Classification

KW - Multiple-modality

KW - Natural Language Supervision

UR - http://www.scopus.com/inward/record.url?scp=85177596915&partnerID=8YFLogxK

U2 - 10.1109/ICASSP49357.2023.10095723

DO - 10.1109/ICASSP49357.2023.10095723

M3 - Conference contribution

AN - SCOPUS:85177596915

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

BT - ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023

Y2 - 4 June 2023 through 10 June 2023

ER -

Zhang Y, Zhang M, Li W, Tao R. Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification. 在 ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings. Institute of Electrical and Electronics Engineers Inc. 2023. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/ICASSP49357.2023.10095723

Multi-Modal Domain Generalization for Cross-Scene Hyperspectral Image Classification

摘要

出版系列

会议

访问文件

其它文件与链接

指纹

引用此