Video annotation via image groups from the web

Han Wang; Xinxiao Wu; Yunde Jia

doi:10.1109/TMM.2014.2312251

Video annotation via image groups from the web

Han Wang, Xinxiao Wu^*, Yunde Jia

^*此作品的通讯作者

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文章 › 同行评审

21 引用（Scopus）

摘要

Searching desirable events in uncontrolled videos is a challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consuming and labor expensive to collect a large amount of required labeled videos for training event models under various circumstances. To alleviate this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer of images may hurt the video annotation performance. Therefore, we propose a novel Group-based Domain Adaptation (GDA) learning framework to leverage different groups of knowledge (source domain) queried from the Web image search engine to consumer videos (target domain). Different from traditional methods using multiple source domains of images, our method organizes the Web images according to their intrinsic semantic relationships instead of their sources. Specifically, two different types of groups (i.e., event-specific groups and concept-specific groups) are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos. Under this framework, we assign different weights to different image groups according to the relevances between the source groups and the target domain, and each group weight represents how contributive the corresponding source image group is to the knowledge transferred to the target video. In order to make the group weights and group classifiers mutually beneficial and reciprocal, a joint optimization algorithm is presented for simultaneously learning the weights and classifiers, using two novel data-dependent regularizers. Experimental results on three challenging video datasets (i.e., CCV, Kodak, and YouTube) demonstrate the effectiveness of leveraging grouped knowledge gained from Web images for video annotation.

源语言	英语
文章编号	6856249
页（从-至）	1282-1291
页数	10
期刊	IEEE Transactions on Multimedia
卷	16
期	5
DOI	https://doi.org/10.1109/TMM.2014.2312251
出版状态	已出版 - 8月 2014

访问文件

10.1109/TMM.2014.2312251

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{0550a89ba61543368d1cf6185d2ccf3f,

title = "Video annotation via image groups from the web",

abstract = "Searching desirable events in uncontrolled videos is a challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consuming and labor expensive to collect a large amount of required labeled videos for training event models under various circumstances. To alleviate this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer of images may hurt the video annotation performance. Therefore, we propose a novel Group-based Domain Adaptation (GDA) learning framework to leverage different groups of knowledge (source domain) queried from the Web image search engine to consumer videos (target domain). Different from traditional methods using multiple source domains of images, our method organizes the Web images according to their intrinsic semantic relationships instead of their sources. Specifically, two different types of groups (i.e., event-specific groups and concept-specific groups) are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos. Under this framework, we assign different weights to different image groups according to the relevances between the source groups and the target domain, and each group weight represents how contributive the corresponding source image group is to the knowledge transferred to the target video. In order to make the group weights and group classifiers mutually beneficial and reciprocal, a joint optimization algorithm is presented for simultaneously learning the weights and classifiers, using two novel data-dependent regularizers. Experimental results on three challenging video datasets (i.e., CCV, Kodak, and YouTube) demonstrate the effectiveness of leveraging grouped knowledge gained from Web images for video annotation.",

keywords = "Concept-specific group, domain adaptation, event-specific group, video annotation",

author = "Han Wang and Xinxiao Wu and Yunde Jia",

year = "2014",

month = aug,

doi = "10.1109/TMM.2014.2312251",

language = "English",

volume = "16",

pages = "1282--1291",

journal = "IEEE Transactions on Multimedia",

issn = "1520-9210",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "5",

}

TY - JOUR

T1 - Video annotation via image groups from the web

AU - Wang, Han

AU - Wu, Xinxiao

AU - Jia, Yunde

PY - 2014/8

Y1 - 2014/8

N2 - Searching desirable events in uncontrolled videos is a challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consuming and labor expensive to collect a large amount of required labeled videos for training event models under various circumstances. To alleviate this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer of images may hurt the video annotation performance. Therefore, we propose a novel Group-based Domain Adaptation (GDA) learning framework to leverage different groups of knowledge (source domain) queried from the Web image search engine to consumer videos (target domain). Different from traditional methods using multiple source domains of images, our method organizes the Web images according to their intrinsic semantic relationships instead of their sources. Specifically, two different types of groups (i.e., event-specific groups and concept-specific groups) are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos. Under this framework, we assign different weights to different image groups according to the relevances between the source groups and the target domain, and each group weight represents how contributive the corresponding source image group is to the knowledge transferred to the target video. In order to make the group weights and group classifiers mutually beneficial and reciprocal, a joint optimization algorithm is presented for simultaneously learning the weights and classifiers, using two novel data-dependent regularizers. Experimental results on three challenging video datasets (i.e., CCV, Kodak, and YouTube) demonstrate the effectiveness of leveraging grouped knowledge gained from Web images for video annotation.

AB - Searching desirable events in uncontrolled videos is a challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consuming and labor expensive to collect a large amount of required labeled videos for training event models under various circumstances. To alleviate this problem, we propose to leverage abundant Web images for videos since Web images contain a rich source of information with many events roughly annotated and taken under various conditions. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer of images may hurt the video annotation performance. Therefore, we propose a novel Group-based Domain Adaptation (GDA) learning framework to leverage different groups of knowledge (source domain) queried from the Web image search engine to consumer videos (target domain). Different from traditional methods using multiple source domains of images, our method organizes the Web images according to their intrinsic semantic relationships instead of their sources. Specifically, two different types of groups (i.e., event-specific groups and concept-specific groups) are exploited to respectively describe the event-level and concept-level semantic meanings of target-domain videos. Under this framework, we assign different weights to different image groups according to the relevances between the source groups and the target domain, and each group weight represents how contributive the corresponding source image group is to the knowledge transferred to the target video. In order to make the group weights and group classifiers mutually beneficial and reciprocal, a joint optimization algorithm is presented for simultaneously learning the weights and classifiers, using two novel data-dependent regularizers. Experimental results on three challenging video datasets (i.e., CCV, Kodak, and YouTube) demonstrate the effectiveness of leveraging grouped knowledge gained from Web images for video annotation.

KW - Concept-specific group

KW - domain adaptation

KW - event-specific group

KW - video annotation

UR - http://www.scopus.com/inward/record.url?scp=84904728918&partnerID=8YFLogxK

U2 - 10.1109/TMM.2014.2312251

DO - 10.1109/TMM.2014.2312251

M3 - Article

AN - SCOPUS:84904728918

SN - 1520-9210

VL - 16

SP - 1282

EP - 1291

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

IS - 5

M1 - 6856249

ER -

Video annotation via image groups from the web

摘要

访问文件

其它文件与链接

指纹

引用此