Deep neural network based unsupervised video representation

Xinxiao Wu; Kun Wu

doi:10.11860/j.issn.1673-0291.2017.06.002

Deep neural network based unsupervised video representation

Xinxiao Wu, Kun Wu

计算机学院

Beijing Institute of Technology

科研成果: 期刊稿件 › 文献综述 › 同行评审

摘要

Most video representation methods are supervised in the field of computer vision, requiring large amounts of labeled training video sets which is expensive to scale up to rapidly growing data. To solve this problem, this paper proposes an unsupervised video representation method using deep convolutional neural network. The improved dense trajectory (iDT) is utilized to extract the video blocks which alternately train the convolutional neural network and clusters. The deep convolutional neural network model is trained by iteratively algorithm to get the unsupervised video representations. The proposed model is applied to extract features in HMDB 51 and CCV datasets for tasks of motion recognition and event detection respectively. In the experiments, a 62.6% mean accuracy and a 43.6% mean average prevision (mAP) are obtained respectively which proves the effectiveness of the proposed method.

源语言	英语
页（从-至）	8-12
页数	5
期刊	Beijing Jiaotong Daxue Xuebao/Journal of Beijing Jiaotong University
卷	41
期	6
DOI	https://doi.org/10.11860/j.issn.1673-0291.2017.06.002
出版状态	已出版 - 1 12月 2017

访问文件

10.11860/j.issn.1673-0291.2017.06.002

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{a9b9ab4f52df40f2a121656224ec71e6,

title = "Deep neural network based unsupervised video representation",

abstract = "Most video representation methods are supervised in the field of computer vision, requiring large amounts of labeled training video sets which is expensive to scale up to rapidly growing data. To solve this problem, this paper proposes an unsupervised video representation method using deep convolutional neural network. The improved dense trajectory (iDT) is utilized to extract the video blocks which alternately train the convolutional neural network and clusters. The deep convolutional neural network model is trained by iteratively algorithm to get the unsupervised video representations. The proposed model is applied to extract features in HMDB 51 and CCV datasets for tasks of motion recognition and event detection respectively. In the experiments, a 62.6% mean accuracy and a 43.6% mean average prevision (mAP) are obtained respectively which proves the effectiveness of the proposed method.",

keywords = "Convolution neural networks, Unsupervised learning, Video representation",

author = "Xinxiao Wu and Kun Wu",

year = "2017",

month = dec,

day = "1",

doi = "10.11860/j.issn.1673-0291.2017.06.002",

language = "English",

volume = "41",

pages = "8--12",

journal = "Beijing Jiaotong Daxue Xuebao/Journal of Beijing Jiaotong University",

issn = "1673-0291",

publisher = "Journal Northern Jiaotong University",

number = "6",

}

TY - JOUR

T1 - Deep neural network based unsupervised video representation

AU - Wu, Xinxiao

AU - Wu, Kun

PY - 2017/12/1

Y1 - 2017/12/1

N2 - Most video representation methods are supervised in the field of computer vision, requiring large amounts of labeled training video sets which is expensive to scale up to rapidly growing data. To solve this problem, this paper proposes an unsupervised video representation method using deep convolutional neural network. The improved dense trajectory (iDT) is utilized to extract the video blocks which alternately train the convolutional neural network and clusters. The deep convolutional neural network model is trained by iteratively algorithm to get the unsupervised video representations. The proposed model is applied to extract features in HMDB 51 and CCV datasets for tasks of motion recognition and event detection respectively. In the experiments, a 62.6% mean accuracy and a 43.6% mean average prevision (mAP) are obtained respectively which proves the effectiveness of the proposed method.

AB - Most video representation methods are supervised in the field of computer vision, requiring large amounts of labeled training video sets which is expensive to scale up to rapidly growing data. To solve this problem, this paper proposes an unsupervised video representation method using deep convolutional neural network. The improved dense trajectory (iDT) is utilized to extract the video blocks which alternately train the convolutional neural network and clusters. The deep convolutional neural network model is trained by iteratively algorithm to get the unsupervised video representations. The proposed model is applied to extract features in HMDB 51 and CCV datasets for tasks of motion recognition and event detection respectively. In the experiments, a 62.6% mean accuracy and a 43.6% mean average prevision (mAP) are obtained respectively which proves the effectiveness of the proposed method.

KW - Convolution neural networks

KW - Unsupervised learning

KW - Video representation

UR - http://www.scopus.com/inward/record.url?scp=85048325119&partnerID=8YFLogxK

U2 - 10.11860/j.issn.1673-0291.2017.06.002

DO - 10.11860/j.issn.1673-0291.2017.06.002

M3 - Review article

AN - SCOPUS:85048325119

SN - 1673-0291

VL - 41

SP - 8

EP - 12

JO - Beijing Jiaotong Daxue Xuebao/Journal of Beijing Jiaotong University

JF - Beijing Jiaotong Daxue Xuebao/Journal of Beijing Jiaotong University

IS - 6

ER -

Deep neural network based unsupervised video representation

摘要

访问文件

其它文件与链接

指纹

引用此