TY - JOUR
T1 - HCDet
T2 - hidden X-ray contraband detection based on HyAtt-CNN and local implicit feature pyramid network
AU - Wang, Zhihan
AU - Du, Huiqian
AU - Xie, Min
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
PY - 2025/6
Y1 - 2025/6
N2 - Detecting contraband in X-ray images is challenging due to heavy object overlap. In this paper, we present HCDet, a novel framework that introduces key innovations to tackle the challenges. First, we propose Hybrid Self-Attention (HyAtt), a lightweight self-attention mechanism that strikes a balance between reducing computational cost and maintaining high-quality feature extraction. Building on this, we develop the HyAtt-CNN block, a layer aggregation module that employs split—CNN and Transformer—merge strategies to enhance feature aggregation. This hybrid design combines both local and global context representations, enables the backbone network to more effectively address the complexities of hidden contraband, particularly when items are heavily overlapped. Additionally, we propose Local Implicit Feature Pyramid Network (LIFPN), a novel detection neck that utilizes implicit feature functions to resolve ambiguities in feature fusion. By employing implicit representations, LIFPN enhances low-resolution features and magnifies them to higher resolutions, reducing feature blurring and enabling precise multi-scale fusion. By integrating HyAtt-CNN and LIFPN, HCDet provides a robust and efficient solution for detecting hidden contraband, significantly improving detection accuracy. HCDet-S achieve a mAP50:95 score of 66% on PIDRay hidden test set, which is about 3.6% higher than YOLOv8-S with the same model size. Extensive experiments demonstrate the effectiveness of HCDet in overcoming the challenges posed by overlapping items in X-ray images.
AB - Detecting contraband in X-ray images is challenging due to heavy object overlap. In this paper, we present HCDet, a novel framework that introduces key innovations to tackle the challenges. First, we propose Hybrid Self-Attention (HyAtt), a lightweight self-attention mechanism that strikes a balance between reducing computational cost and maintaining high-quality feature extraction. Building on this, we develop the HyAtt-CNN block, a layer aggregation module that employs split—CNN and Transformer—merge strategies to enhance feature aggregation. This hybrid design combines both local and global context representations, enables the backbone network to more effectively address the complexities of hidden contraband, particularly when items are heavily overlapped. Additionally, we propose Local Implicit Feature Pyramid Network (LIFPN), a novel detection neck that utilizes implicit feature functions to resolve ambiguities in feature fusion. By employing implicit representations, LIFPN enhances low-resolution features and magnifies them to higher resolutions, reducing feature blurring and enabling precise multi-scale fusion. By integrating HyAtt-CNN and LIFPN, HCDet provides a robust and efficient solution for detecting hidden contraband, significantly improving detection accuracy. HCDet-S achieve a mAP50:95 score of 66% on PIDRay hidden test set, which is about 3.6% higher than YOLOv8-S with the same model size. Extensive experiments demonstrate the effectiveness of HCDet in overcoming the challenges posed by overlapping items in X-ray images.
KW - Feature pyramid network
KW - Object detection
KW - Self-attention
KW - X-ray image
UR - http://www.scopus.com/inward/record.url?scp=105004425709&partnerID=8YFLogxK
U2 - 10.1007/s00530-025-01831-4
DO - 10.1007/s00530-025-01831-4
M3 - Article
AN - SCOPUS:105004425709
SN - 0942-4962
VL - 31
JO - Multimedia Systems
JF - Multimedia Systems
IS - 3
M1 - 236
ER -