UniHead: Unifying Multi-Perception for Detection Heads

Hantao Zhou; Rui Yang; Yachao Zhang; Haoran Duan; Yawen Huang; Runze Hu; Xiu Li; Yefeng Zheng

doi:10.1109/TNNLS.2024.3412947

UniHead: Unifying Multi-Perception for Detection Heads

Hantao Zhou, Rui Yang, Yachao Zhang, Haoran Duan, Yawen Huang, Runze Hu, Xiu Li, Yefeng Zheng

信息与电子学院

科研成果: 期刊稿件 › 文章 › 同行评审

1 引用（Scopus）

摘要

The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula>2.7 AP gains in RetinaNet, <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula>2.9 AP gains in FreeAnchor, and <inline-formula> <tex-math notation="LaTeX">$+$</tex-math> </inline-formula>2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.

源语言	英语
页（从-至）	1-12
页数	12
期刊	IEEE Transactions on Neural Networks and Learning Systems
DOI	https://doi.org/10.1109/TNNLS.2024.3412947
出版状态	已接受/待刊 - 2024

访问文件

10.1109/TNNLS.2024.3412947

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{4c8d16d0c04e42d4926005ab41a59639,

title = "UniHead: Unifying Multi-Perception for Detection Heads",

abstract = "The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain $+$ 2.7 AP gains in RetinaNet, $+$ 2.9 AP gains in FreeAnchor, and $+$ 2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.",

keywords = "Detection head, object detection, transformer, unifying multi-perception",

author = "Hantao Zhou and Rui Yang and Yachao Zhang and Haoran Duan and Yawen Huang and Runze Hu and Xiu Li and Yefeng Zheng",

note = "Publisher Copyright: IEEE",

year = "2024",

doi = "10.1109/TNNLS.2024.3412947",

language = "English",

pages = "1--12",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

TY - JOUR

T1 - UniHead

T2 - Unifying Multi-Perception for Detection Heads

AU - Zhou, Hantao

AU - Yang, Rui

AU - Zhang, Yachao

AU - Duan, Haoran

AU - Huang, Yawen

AU - Hu, Runze

AU - Li, Xiu

AU - Zheng, Yefeng

N1 - Publisher Copyright: IEEE

PY - 2024

Y1 - 2024

N2 - The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain $+$ 2.7 AP gains in RetinaNet, $+$ 2.9 AP gains in FreeAnchor, and $+$ 2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.

AB - The detection head constitutes a pivotal component within object detectors, tasked with executing both classification and localization functions. Regrettably, the commonly used parallel head often lacks omni perceptual capabilities, such as deformation perception (DP), global perception (GP), and cross-task perception (CTP). Despite numerous methods attempting to enhance these abilities from a single aspect, achieving a comprehensive and unified solution remains a significant challenge. In response to this challenge, we develop an innovative detection head, termed UniHead, to unify three perceptual abilities simultaneously. More precisely, our approach: 1) introduces DP, enabling the model to adaptively sample object features; 2) proposes a dual-axial aggregation transformer (DAT) to adeptly model long-range dependencies, thereby achieving GP; and 3) devises a cross-task interaction transformer (CIT) that facilitates interaction between the classification and localization branches, thus aligning the two tasks. As a plug-and-play method, the proposed UniHead can be conveniently integrated with existing detectors. Extensive experiments on the COCO dataset demonstrate that our UniHead can bring significant improvements to many detectors. For instance, the UniHead can obtain $+$ 2.7 AP gains in RetinaNet, $+$ 2.9 AP gains in FreeAnchor, and $+$ 2.1 AP gains in GFL. The code is available at https://github.com/zht8506/UniHead.

KW - Detection head

KW - object detection

KW - transformer

KW - unifying multi-perception

UR - http://www.scopus.com/inward/record.url?scp=85196751740&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2024.3412947

DO - 10.1109/TNNLS.2024.3412947

M3 - Article

AN - SCOPUS:85196751740

SN - 2162-237X

SP - 1

EP - 12

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -

UniHead: Unifying Multi-Perception for Detection Heads

摘要

访问文件

其它文件与链接

指纹

引用此