TY - GEN
T1 - Towards low bit rate mobile visual search with multiple-channel coding
AU - Ji, Rongrong
AU - Duan, Ling Yu
AU - Chen, Jie
AU - Yao, Hongxun
AU - Rui, Yong
AU - Chang, Shih Fu
AU - Gao, Wen
PY - 2011
Y1 - 2011
N2 - In this paper, we propose a multiple-channel coding scheme to extract compact visual descriptors for low bit rate mobile visual search. Different from previous visual search scenarios that send the query image, we make use of the ever growing mobile computational capability to directly extract compact visual descriptors at the mobile end. Meanwhile, stepping forward from the state-of-the-art compact descriptor extractions, we exploit the rich contextual cues at the mobile end (such as GPS tags for mobile visual search and 2D barcodes or RFID tags for mobile product search), together with the visual statistics at the reference database, to learn multiple coding channels. Therefore, we describe the query with one of many forms of high-dimensional visual signature, which is subsequently mapped to one or more channels and compressed. The compression function within each channel is learnt based on a novel robust PCA scheme, with specific consideration to preserve the retrieval ranking capability of the original signature. We have deployed our scheme on both iPhone4 and HTC DESIRE 7 to search ten million landmark images in a low bit rate setting. Quantitative comparisons to the state-of-the-arts demonstrate our significant advantages in descriptor compactness (with orders of magnitudes improvement) and retrieval mAP in mobile landmark, product, and CD/book cover search.
AB - In this paper, we propose a multiple-channel coding scheme to extract compact visual descriptors for low bit rate mobile visual search. Different from previous visual search scenarios that send the query image, we make use of the ever growing mobile computational capability to directly extract compact visual descriptors at the mobile end. Meanwhile, stepping forward from the state-of-the-art compact descriptor extractions, we exploit the rich contextual cues at the mobile end (such as GPS tags for mobile visual search and 2D barcodes or RFID tags for mobile product search), together with the visual statistics at the reference database, to learn multiple coding channels. Therefore, we describe the query with one of many forms of high-dimensional visual signature, which is subsequently mapped to one or more channels and compressed. The compression function within each channel is learnt based on a novel robust PCA scheme, with specific consideration to preserve the retrieval ranking capability of the original signature. We have deployed our scheme on both iPhone4 and HTC DESIRE 7 to search ten million landmark images in a low bit rate setting. Quantitative comparisons to the state-of-the-arts demonstrate our significant advantages in descriptor compactness (with orders of magnitudes improvement) and retrieval mAP in mobile landmark, product, and CD/book cover search.
KW - Compact descriptor
KW - Contextual learning
KW - Data compression
KW - Mobile visual search
KW - Wireless communication
UR - http://www.scopus.com/inward/record.url?scp=84455212226&partnerID=8YFLogxK
U2 - 10.1145/2072298.2072372
DO - 10.1145/2072298.2072372
M3 - Conference contribution
AN - SCOPUS:84455212226
SN - 9781450306164
T3 - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops
SP - 573
EP - 582
BT - MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops
T2 - 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11
Y2 - 28 November 2011 through 1 December 2011
ER -