TY - GEN
T1 - Inference Adaptive Thresholding based Non-Maximum Suppression for Object Detection in Video Image Sequence
AU - Jiang, Mengqing
AU - Jiang, Yurong
AU - Li, Min
AU - Meng, Bo
AU - Song, Hong
AU - Ai, Danni
AU - Yang, Jian
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019
Y1 - 2019
N2 - This study proposes a novel inference adaptive thresholding based non-maximum suppression (NMS) (IAT-NMS) algorithm for deriving temporal cues between video sequences. The inference of temporal connectivity is first derived according to an overlapping measure of the bounding boxes between adjacent frames. Frames with high-confidence detection object are taken as key frames to leverage the scores of neighbor detections and preserve potential detections of blurred objects with low scores. Then, bounding boxes within each frame are ranked via their confidence scores and the overlapping ratio between the bounding box with the highest score against the remaining surrounding boxes is computed. This measure of overlapping is brought into a Gaussian function to estimate weights for adaptive suppression and to softly suppress the detection scores of possible severely overlapped objects. The proposed method is compared with state-of-the-art video object detection techniques. With the application of IAT-NMS, overlapping objects originally undistinguishable in the compared methods become detectable. Experimental results demonstrate that this simple and unsupervised method outperforms state-of-the-art NMS algorithms, with an increase of 6% in mean average precision (mAP) on the ImageNet VID dataset. Our study on performance limitations and sensitivity to parametric variations also finds that IAT-NMS demonstrates better detection capability than does the three compared algorithms, which fail to detect all targets or distinguish in the presence of multiple overlapping targets.
AB - This study proposes a novel inference adaptive thresholding based non-maximum suppression (NMS) (IAT-NMS) algorithm for deriving temporal cues between video sequences. The inference of temporal connectivity is first derived according to an overlapping measure of the bounding boxes between adjacent frames. Frames with high-confidence detection object are taken as key frames to leverage the scores of neighbor detections and preserve potential detections of blurred objects with low scores. Then, bounding boxes within each frame are ranked via their confidence scores and the overlapping ratio between the bounding box with the highest score against the remaining surrounding boxes is computed. This measure of overlapping is brought into a Gaussian function to estimate weights for adaptive suppression and to softly suppress the detection scores of possible severely overlapped objects. The proposed method is compared with state-of-the-art video object detection techniques. With the application of IAT-NMS, overlapping objects originally undistinguishable in the compared methods become detectable. Experimental results demonstrate that this simple and unsupervised method outperforms state-of-the-art NMS algorithms, with an increase of 6% in mean average precision (mAP) on the ImageNet VID dataset. Our study on performance limitations and sensitivity to parametric variations also finds that IAT-NMS demonstrates better detection capability than does the three compared algorithms, which fail to detect all targets or distinguish in the presence of multiple overlapping targets.
KW - Non-maximum suppression
KW - Object detection
KW - Video image
UR - http://www.scopus.com/inward/record.url?scp=85066489551&partnerID=8YFLogxK
U2 - 10.1145/3319921.3319950
DO - 10.1145/3319921.3319950
M3 - Conference contribution
AN - SCOPUS:85066489551
SN - 9781450361286
T3 - ACM International Conference Proceeding Series
SP - 21
EP - 27
BT - ACM International Conference Proceeding Series
PB - Association for Computing Machinery
T2 - 3rd International Conference on Innovation in Artificial Intelligence, ICIAI 2019
Y2 - 15 March 2019 through 18 March 2019
ER -