Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification

Yuxiang Zhang; Mengmeng Zhang; Wei Li; Shuai Wang; Ran Tao

doi:10.1109/TGRS.2022.3233885

Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification

Yuxiang Zhang, Mengmeng Zhang^*, Wei Li, Shuai Wang, Ran Tao

^*Corresponding author for this work

School of Information and Electronics

Research output: Contribution to journal › Article › peer-review

123 Citations (Scopus)

Abstract

Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.

Original language	English
Article number	5501312
Journal	IEEE Transactions on Geoscience and Remote Sensing
Volume	61
DOIs	https://doi.org/10.1109/TGRS.2022.3233885
Publication status	Published - 2023

Keywords

Contrastive learning
cross-scene
domain generalization
hyperspectral image (HSI) classification
multiple-modality
natural language supervision

Access to Document

10.1109/TGRS.2022.3233885

Cite this

Zhang, Y., Zhang, M., Li, W., Wang, S., & Tao, R. (2023). Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification. IEEE Transactions on Geoscience and Remote Sensing, 61, Article 5501312. https://doi.org/10.1109/TGRS.2022.3233885

@article{61282223c3684d3b9c02d3df3d960eab,

title = "Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification",

abstract = "Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.",

keywords = "Contrastive learning, cross-scene, domain generalization, hyperspectral image (HSI) classification, multiple-modality, natural language supervision",

author = "Yuxiang Zhang and Mengmeng Zhang and Wei Li and Shuai Wang and Ran Tao",

note = "Publisher Copyright: {\textcopyright} 1980-2012 IEEE.",

year = "2023",

doi = "10.1109/TGRS.2022.3233885",

language = "English",

volume = "61",

journal = "IEEE Transactions on Geoscience and Remote Sensing",

issn = "0196-2892",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification

AU - Zhang, Yuxiang

AU - Zhang, Mengmeng

AU - Li, Wei

AU - Wang, Shuai

AU - Tao, Ran

PY - 2023

Y1 - 2023

N2 - Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.

AB - Text information including extensive prior knowledge about land cover classes has been ignored in hyperspectral image (HSI) classification tasks. It is necessary to explore the effectiveness of linguistic mode in assisting HSI classification. In addition, the large-scale pretraining image-text foundation models have demonstrated great performance in a variety of downstream applications, including zero-shot transfer. However, most domain generalization methods have never addressed mining linguistic modal knowledge to improve the generalization performance of model. To compensate for the inadequacies listed above, a language-aware domain generalization network (LDGnet) is proposed to learn cross-domain-invariant representation from cross-domain shared prior knowledge. The proposed method only trains on the source domain (SD) and then transfers the model to the target domain (TD). The dual-stream architecture including the image encoder and text encoder is used to extract visual and linguistic features, in which coarse-grained and fine-grained text representations are designed to extract two levels of linguistic features. Furthermore, linguistic features are used as cross-domain shared semantic space, and visual-linguistic alignment is completed by supervised contrastive learning in semantic space. Extensive experiments on three datasets demonstrate the superiority of the proposed method when compared with the state-of-the-art techniques. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TGRS_LDGnet.

KW - Contrastive learning

KW - cross-scene

KW - domain generalization

KW - hyperspectral image (HSI) classification

KW - multiple-modality

KW - natural language supervision

UR - http://www.scopus.com/inward/record.url?scp=85147204785&partnerID=8YFLogxK

U2 - 10.1109/TGRS.2022.3233885

DO - 10.1109/TGRS.2022.3233885

M3 - Article

AN - SCOPUS:85147204785

SN - 0196-2892

VL - 61

JO - IEEE Transactions on Geoscience and Remote Sensing

JF - IEEE Transactions on Geoscience and Remote Sensing

M1 - 5501312

ER -

Language-Aware Domain Generalization Network for Cross-Scene Hyperspectral Image Classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this