TY - GEN
T1 - AFD-Net
T2 - 29th ACM International Conference on Multimedia, MM 2021
AU - Liu, Longyao
AU - Ma, Bo
AU - Zhang, Yulin
AU - Yi, Xin
AU - Li, Haozhi
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/17
Y1 - 2021/10/17
N2 - Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component in the detector, yet few of them take the distinct preferences towards feature embedding of two subtasks into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. In this way, separate state estimation is achieved by the R-CNN detector. Furthermore, we introduce Adaptive Fusion Mechanism to guide the design of encoders for efficient feature fusion in the specific subtask. Extensive experiments on PASCAL VOC and MS COCO show that our approach achieves state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.
AB - Few-shot object detection (FSOD) aims at learning a detector that can fast adapt to previously unseen objects with scarce annotated examples. Existing methods solve this problem by performing subtasks of classification and localization utilizing a shared component in the detector, yet few of them take the distinct preferences towards feature embedding of two subtasks into consideration. In this paper, we carefully analyze the characteristics of FSOD, and present that a few-shot detector should consider the explicit decomposition of two subtasks, as well as leveraging information from both of them to enhance feature representations. To the end, we propose a simple yet effective Adaptive Fully-Dual Network (AFD-Net). Specifically, we extend Faster R-CNN by introducing Dual Query Encoder and Dual Attention Generator for separate feature extraction, and Dual Aggregator for separate model reweighting. In this way, separate state estimation is achieved by the R-CNN detector. Furthermore, we introduce Adaptive Fusion Mechanism to guide the design of encoders for efficient feature fusion in the specific subtask. Extensive experiments on PASCAL VOC and MS COCO show that our approach achieves state-of-the-art performance by a large margin, demonstrating its effectiveness and generalization ability.
KW - few-shot object detection
KW - meta-learning
KW - task decomposition
UR - http://www.scopus.com/inward/record.url?scp=85119383437&partnerID=8YFLogxK
U2 - 10.1145/3474085.3475428
DO - 10.1145/3474085.3475428
M3 - Conference contribution
AN - SCOPUS:85119383437
T3 - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
SP - 2549
EP - 2557
BT - MM 2021 - Proceedings of the 29th ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 20 October 2021 through 24 October 2021
ER -