TY - JOUR
T1 - Consecutive Pre-Training
T2 - A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain
AU - Zhang, Tong
AU - Gao, Peng
AU - Dong, Hao
AU - Zhuang, Yin
AU - Wang, Guanqun
AU - Zhang, Wei
AU - Chen, He
N1 - Publisher Copyright:
© 2022 by the authors.
PY - 2022/11
Y1 - 2022/11
N2 - Currently, under supervised learning, a model pre-trained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated knowledge transfer learning. Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pre-training in the remote sensing domain (RSD). Moreover, pre-training models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable incorrect labeling, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pre-training and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive Pre-Training (CSPT) is proposed based on the idea of not stopping pre-training in natural language processing (NLP), which can gradually bridge the domain gap and transfer large-scale data knowledge to any specific domain (e.g., from nature scene domain to RSD) In addition, the proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments were carried out on twelve remote sensing datasets involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and synthetic aperture radar (SAR)). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in the RSD can outperform the previous knowledge transfer learning strategies based on model pre-training without any expensive manually labeling and even surpass the state-of-the-art (SOTA) performance without any careful network architecture designing.
AB - Currently, under supervised learning, a model pre-trained by a large-scale nature scene dataset and then fine-tuned on a few specific task labeling data is the paradigm that has dominated knowledge transfer learning. Unfortunately, due to different categories of imaging data and stiff challenges of data annotation, there is not a large enough and uniform remote sensing dataset to support large-scale pre-training in the remote sensing domain (RSD). Moreover, pre-training models on large-scale nature scene datasets by supervised learning and then directly fine-tuning on diverse downstream tasks seems to be a crude method, which is easily affected by inevitable incorrect labeling, severe domain gaps and task-aware discrepancies. Thus, in this paper, considering the self-supervised pre-training and powerful vision transformer (ViT) architecture, a concise and effective knowledge transfer learning strategy called ConSecutive Pre-Training (CSPT) is proposed based on the idea of not stopping pre-training in natural language processing (NLP), which can gradually bridge the domain gap and transfer large-scale data knowledge to any specific domain (e.g., from nature scene domain to RSD) In addition, the proposed CSPT also can release the huge potential of unlabeled data for task-aware model training. Finally, extensive experiments were carried out on twelve remote sensing datasets involving three types of downstream tasks (e.g., scene classification, object detection and land cover classification) and two types of imaging data (e.g., optical and synthetic aperture radar (SAR)). The results show that by utilizing the proposed CSPT for task-aware model training, almost all downstream tasks in the RSD can outperform the previous knowledge transfer learning strategies based on model pre-training without any expensive manually labeling and even surpass the state-of-the-art (SOTA) performance without any careful network architecture designing.
KW - knowledge transfer learning
KW - remote sensing domain
KW - self-supervised learning
KW - vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85142741642&partnerID=8YFLogxK
U2 - 10.3390/rs14225675
DO - 10.3390/rs14225675
M3 - Article
AN - SCOPUS:85142741642
SN - 2072-4292
VL - 14
JO - Remote Sensing
JF - Remote Sensing
IS - 22
M1 - 5675
ER -