TY - JOUR
T1 - FPNFormer
T2 - Rethink the Method of Processing the Rotation-Invariance and Rotation-Equivariance on Arbitrary-Oriented Object Detection
AU - Tian, Yang
AU - Zhang, Mengmeng
AU - Li, Jinyu
AU - Li, Yangfan
AU - Yang, Hong
AU - Li, Wei
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Feature pyramid network transformer decoder (FPNFormer) module, which can effectively deal with the strong rotation arbitrary of remote sensing images while improving the expressiveness and robustness of the model. It is a plug-and-play module that can be well transferred to various detection models and significantly improves performance. Specifically, we use the computational method of transformer decoder to deal with the problem that the image has any orientation, and its output weakly depends on the order of the input data. We apply it to the feature fusion stage and design two ways top-down and down-top to fuse features of different scales, which enables the model to have a more vital ability to perceive objects at different scales and angles. Experiments on commonly used benchmarks (DOTA1.0, DOTA1.5, SSDD, and RSDD) demonstrate that the proposed FPNFormer module significantly improves the performance of multiple arbitrary-oriented object detectors, such as 1.99% map improvement of rotated retinanet on DOTA's cross-validation set. On RSDD datasets, the baseline model using FPNFormer improves the map of large objects by 5.1%. Combined with more competitive models, the proposed method can achieve a 79.39% map on the DOTA1.0 dataset. The code is available at https://github.com/bityangtian/FPNFormer.
AB - Feature pyramid network transformer decoder (FPNFormer) module, which can effectively deal with the strong rotation arbitrary of remote sensing images while improving the expressiveness and robustness of the model. It is a plug-and-play module that can be well transferred to various detection models and significantly improves performance. Specifically, we use the computational method of transformer decoder to deal with the problem that the image has any orientation, and its output weakly depends on the order of the input data. We apply it to the feature fusion stage and design two ways top-down and down-top to fuse features of different scales, which enables the model to have a more vital ability to perceive objects at different scales and angles. Experiments on commonly used benchmarks (DOTA1.0, DOTA1.5, SSDD, and RSDD) demonstrate that the proposed FPNFormer module significantly improves the performance of multiple arbitrary-oriented object detectors, such as 1.99% map improvement of rotated retinanet on DOTA's cross-validation set. On RSDD datasets, the baseline model using FPNFormer improves the map of large objects by 5.1%. Combined with more competitive models, the proposed method can achieve a 79.39% map on the DOTA1.0 dataset. The code is available at https://github.com/bityangtian/FPNFormer.
KW - Arbitrary-oriented object detection
KW - deep learning
KW - rotation-invariant
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85182354894&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3351156
DO - 10.1109/TGRS.2024.3351156
M3 - Article
AN - SCOPUS:85182354894
SN - 0196-2892
VL - 62
SP - 1
EP - 10
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5605610
ER -