TY - JOUR
T1 - 适用于鱼眼图像的改进 YOLOv7 目标检测算法
AU - Wu, Zhaodong
AU - Xu, Cheng
AU - Liu, Hongzhe
AU - Fu, Ying
AU - Jian, Muwei
N1 - Publisher Copyright:
© 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
PY - 2024/7/15
Y1 - 2024/7/15
N2 - Images taken by fisheye cameras are characterized by wide field of view, geometric distortion and large scale variance, which bring great challenges to object detectors based on general convolutional networks. Existing object detection algorithms can be further improved with respect to network structure design, feature learning to be applicable to the distorted object detection task on fisheye images. To mitigate the effect of radial distortion on fisheye images, a multi-head attention module with multi-branch stacking structure is used in the YOLOv7 backbone to capture global contextual information. Meanwhile, a simple and efficient layer aggregation structure combining deformable convolutions is used on the Neck side of YOLOv7 to achieve effective multi-scale feature fusion. Experiments are conducted on the public comprehensive fisheye image dataset VOC_360, and the results show that the improved YOLOv7 fisheye image object detector effectively achieves detection accuracy of 84.3% and 70.4% for mAP50 and mAP50:95, respectively, which is 3.1 percentage points and 6.4 percentage points higher than the baseline model YOLOv7, respectively.
AB - Images taken by fisheye cameras are characterized by wide field of view, geometric distortion and large scale variance, which bring great challenges to object detectors based on general convolutional networks. Existing object detection algorithms can be further improved with respect to network structure design, feature learning to be applicable to the distorted object detection task on fisheye images. To mitigate the effect of radial distortion on fisheye images, a multi-head attention module with multi-branch stacking structure is used in the YOLOv7 backbone to capture global contextual information. Meanwhile, a simple and efficient layer aggregation structure combining deformable convolutions is used on the Neck side of YOLOv7 to achieve effective multi-scale feature fusion. Experiments are conducted on the public comprehensive fisheye image dataset VOC_360, and the results show that the improved YOLOv7 fisheye image object detector effectively achieves detection accuracy of 84.3% and 70.4% for mAP50 and mAP50:95, respectively, which is 3.1 percentage points and 6.4 percentage points higher than the baseline model YOLOv7, respectively.
KW - deformable convolution
KW - fisheye image
KW - multi-head attention
KW - object detection
KW - YOLO algorithm
UR - http://www.scopus.com/inward/record.url?scp=105007345916&partnerID=8YFLogxK
U2 - 10.3778/j.issn.1002-8331.2305-0442
DO - 10.3778/j.issn.1002-8331.2305-0442
M3 - 文章
AN - SCOPUS:105007345916
SN - 1002-8331
VL - 60
SP - 250
EP - 256
JO - Computer Engineering and Applications
JF - Computer Engineering and Applications
IS - 14
ER -