Cross-domain structural model for video event annotation via web images

Han Wang; Xiabi Liu; Xinxiao Wu; Yunde Jia

doi:10.1007/s11042-014-2175-z

Cross-domain structural model for video event annotation via web images

Han Wang, Xiabi Liu^*, Xinxiao Wu, Yunde Jia

^*Corresponding author for this work

School of Computer Science and Technology

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Annotating events in uncontrolled videos is a challenging task. Most of the previous work focuses on obtaining concepts from numerous labeled videos. But it is extremely time consuming and labor expensive to collect a large amount of required labeled videos for modeling events under various circumstances. In this paper, we try to learn models for video event annotation by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated as well. Our method is based on a new discriminative structural model called Cross-Domain Structural Model (CDSM) to transfer knowledge from Web images (source domain) to consumer videos (target domain), by jointly modeling the interaction between videos and images. Specifically, under this framework we build a common feature subspace to deal with the feature distribution mismatching between the video domain and the image domain. Further, we propose to use weak semantic attributes to describe events, which can be obtained with no or little labor. Experimental results on challenging video datasets demonstrate the effectiveness of our transfer learning method.

Original language	English
Pages (from-to)	10439-10456
Number of pages	18
Journal	Multimedia Tools and Applications
Volume	74
Issue number	23
DOIs	https://doi.org/10.1007/s11042-014-2175-z
Publication status	Published - Dec 2015

Keywords

Knowledge transfer
Video analysis
Video annotation

Access to Document

10.1007/s11042-014-2175-z

Cite this

@article{ae6755c9eb2943e5ab5919ac9a24ed17,

title = "Cross-domain structural model for video event annotation via web images",

abstract = "Annotating events in uncontrolled videos is a challenging task. Most of the previous work focuses on obtaining concepts from numerous labeled videos. But it is extremely time consuming and labor expensive to collect a large amount of required labeled videos for modeling events under various circumstances. In this paper, we try to learn models for video event annotation by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated as well. Our method is based on a new discriminative structural model called Cross-Domain Structural Model (CDSM) to transfer knowledge from Web images (source domain) to consumer videos (target domain), by jointly modeling the interaction between videos and images. Specifically, under this framework we build a common feature subspace to deal with the feature distribution mismatching between the video domain and the image domain. Further, we propose to use weak semantic attributes to describe events, which can be obtained with no or little labor. Experimental results on challenging video datasets demonstrate the effectiveness of our transfer learning method.",

keywords = "Knowledge transfer, Video analysis, Video annotation",

author = "Han Wang and Xiabi Liu and Xinxiao Wu and Yunde Jia",

note = "Publisher Copyright: {\textcopyright} Springer Science+Business Media New York 2014.",

year = "2015",

month = dec,

doi = "10.1007/s11042-014-2175-z",

language = "English",

volume = "74",

pages = "10439--10456",

journal = "Multimedia Tools and Applications",

issn = "1380-7501",

publisher = "Springer",

number = "23",

}

TY - JOUR

T1 - Cross-domain structural model for video event annotation via web images

AU - Wang, Han

AU - Liu, Xiabi

AU - Wu, Xinxiao

AU - Jia, Yunde

N1 - Publisher Copyright: © Springer Science+Business Media New York 2014.

PY - 2015/12

Y1 - 2015/12

N2 - Annotating events in uncontrolled videos is a challenging task. Most of the previous work focuses on obtaining concepts from numerous labeled videos. But it is extremely time consuming and labor expensive to collect a large amount of required labeled videos for modeling events under various circumstances. In this paper, we try to learn models for video event annotation by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated as well. Our method is based on a new discriminative structural model called Cross-Domain Structural Model (CDSM) to transfer knowledge from Web images (source domain) to consumer videos (target domain), by jointly modeling the interaction between videos and images. Specifically, under this framework we build a common feature subspace to deal with the feature distribution mismatching between the video domain and the image domain. Further, we propose to use weak semantic attributes to describe events, which can be obtained with no or little labor. Experimental results on challenging video datasets demonstrate the effectiveness of our transfer learning method.

AB - Annotating events in uncontrolled videos is a challenging task. Most of the previous work focuses on obtaining concepts from numerous labeled videos. But it is extremely time consuming and labor expensive to collect a large amount of required labeled videos for modeling events under various circumstances. In this paper, we try to learn models for video event annotation by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated as well. Our method is based on a new discriminative structural model called Cross-Domain Structural Model (CDSM) to transfer knowledge from Web images (source domain) to consumer videos (target domain), by jointly modeling the interaction between videos and images. Specifically, under this framework we build a common feature subspace to deal with the feature distribution mismatching between the video domain and the image domain. Further, we propose to use weak semantic attributes to describe events, which can be obtained with no or little labor. Experimental results on challenging video datasets demonstrate the effectiveness of our transfer learning method.

KW - Knowledge transfer

KW - Video analysis

KW - Video annotation

UR - http://www.scopus.com/inward/record.url?scp=84951894377&partnerID=8YFLogxK

U2 - 10.1007/s11042-014-2175-z

DO - 10.1007/s11042-014-2175-z

M3 - Article

AN - SCOPUS:84951894377

SN - 1380-7501

VL - 74

SP - 10439

EP - 10456

JO - Multimedia Tools and Applications

JF - Multimedia Tools and Applications

IS - 23

ER -

Cross-domain structural model for video event annotation via web images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this