TY - GEN
T1 - Generalized pooling pyramid with hierarchical dictionary sparse coding for event and object recognition
AU - Chen, Shuai
AU - Ma, Bo
AU - Luo, Pei
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Feature coding and vector pooling are essential for image recognition in bag-of-visual-words (BoW) method. Encoding the low-level feature to rich one and pooling it without any information loss are very challenging works. In this paper, generalized pooling pyramid with hierarchical dictionary sparse coding is introduced to get rich sparse codes and alleviate the information loss in the phase of pooling. It includes two modules: First, with the low-level feature, hierarchical dictionary is learned for sparse coding to generate the hierarchical sparse representation. Second, in the phase of vector pooling, we present generalized pooling pyramid by utilizing the probabilistic function to model the statistical distribution of sparse codes. In the generalized pooling pyramid, the Fisher vectors which are computed with Gaussian Mixture (GMM) in different levels, are fused to represent the images. The performance of our method outperforms state-of-the-art performance in a large number of image categorization experiments on the event dataset (UIUC-Sport dataset) and the object recognition dataset (Caltech101 dataset).
AB - Feature coding and vector pooling are essential for image recognition in bag-of-visual-words (BoW) method. Encoding the low-level feature to rich one and pooling it without any information loss are very challenging works. In this paper, generalized pooling pyramid with hierarchical dictionary sparse coding is introduced to get rich sparse codes and alleviate the information loss in the phase of pooling. It includes two modules: First, with the low-level feature, hierarchical dictionary is learned for sparse coding to generate the hierarchical sparse representation. Second, in the phase of vector pooling, we present generalized pooling pyramid by utilizing the probabilistic function to model the statistical distribution of sparse codes. In the generalized pooling pyramid, the Fisher vectors which are computed with Gaussian Mixture (GMM) in different levels, are fused to represent the images. The performance of our method outperforms state-of-the-art performance in a large number of image categorization experiments on the event dataset (UIUC-Sport dataset) and the object recognition dataset (Caltech101 dataset).
UR - http://www.scopus.com/inward/record.url?scp=85045342380&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2017.8296702
DO - 10.1109/ICIP.2017.8296702
M3 - Conference contribution
AN - SCOPUS:85045342380
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 2349
EP - 2353
BT - 2017 IEEE International Conference on Image Processing, ICIP 2017 - Proceedings
PB - IEEE Computer Society
T2 - 24th IEEE International Conference on Image Processing, ICIP 2017
Y2 - 17 September 2017 through 20 September 2017
ER -