TY - JOUR
T1 - Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification
AU - Zhang, Yuxiang
AU - Zhang, Mengmeng
AU - Li, Wei
AU - Wang, Shuai
AU - Tao, Ran
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.
AB - Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.
KW - Contrastive learning
KW - cross-scene
KW - domain generalization
KW - hyperspectral image (HSI) classification
KW - multiple-modality
KW - natural language supervision
UR - http://www.scopus.com/inward/record.url?scp=85147204785&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2022.3233885
DO - 10.1109/TGRS.2022.3233885
M3 - Article
AN - SCOPUS:85147204785
SN - 0196-2892
VL - 61
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5501312
ER -