TY - JOUR
T1 - UniHead
T2 - Unifying Multi-Perception for Detection Heads
AU - Zhou, Hantao
AU - Yang, Rui
AU - Zhang, Yachao
AU - Duan, Haoran
AU - Huang, Yawen
AU - Hu, Runze
AU - Li, Xiu
AU - Zheng, Yefeng
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain $+$ 2.7 AP gains in RetinaNet, $+$ 2.9 AP gains in FreeAnchor, and $+$ 2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.
AB - The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain $+$ 2.7 AP gains in RetinaNet, $+$ 2.9 AP gains in FreeAnchor, and $+$ 2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.
KW - Detection head
KW - object detection
KW - transformer
KW - unifying multi-perception
UR - http://www.scopus.com/inward/record.url?scp=85196751740&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2024.3412947
DO - 10.1109/TNNLS.2024.3412947
M3 - Article
AN - SCOPUS:85196751740
SN - 2162-237X
SP - 1
EP - 12
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -