AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization

Xiao Yu Zhang; Changsheng Li; Haichao Shi; Xiaobin Zhu; Peng Li; Jing Dong

doi:10.1109/TNNLS.2019.2962815

AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization

Xiao Yu Zhang, Changsheng Li, Haichao Shi^*, Xiaobin Zhu, Peng Li, Jing Dong

^*此作品的通讯作者

计算机学院

科研成果: 期刊稿件 › 文章 › 同行评审

32 引用（Scopus）

摘要

The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder-decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed-untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder-decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.

源语言	英语
页（从-至）	1852-1863
页数	12
期刊	IEEE Transactions on Neural Networks and Learning Systems
卷	34
期	4
DOI	https://doi.org/10.1109/TNNLS.2019.2962815
出版状态	已出版 - 1 4月 2023

访问文件

10.1109/TNNLS.2019.2962815

其它文件与链接

链接到 Scopus 的出版物

引用此

@article{fb73783e02da470d854419dfe384d51c,

title = "AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization",

abstract = "The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder-decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed-untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder-decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.",

keywords = "Action recognition, encodera-decoder, knowledge transfer, point process, temporal action localization",

author = "Zhang, {Xiao Yu} and Changsheng Li and Haichao Shi and Xiaobin Zhu and Peng Li and Jing Dong",

note = "Publisher Copyright: {\textcopyright} 2012 IEEE.",

year = "2023",

month = apr,

day = "1",

doi = "10.1109/TNNLS.2019.2962815",

language = "English",

volume = "34",

pages = "1852--1863",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

number = "4",

}

TY - JOUR

T1 - AdapNet

T2 - Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization

AU - Zhang, Xiao Yu

AU - Li, Changsheng

AU - Shi, Haichao

AU - Zhu, Xiaobin

AU - Li, Peng

AU - Dong, Jing

PY - 2023/4/1

Y1 - 2023/4/1

N2 - The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder-decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed-untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder-decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.

AB - The point process is a solid framework to model sequential data, such as videos, by exploring the underlying relevance. As a challenging problem for high-level video understanding, weakly supervised action recognition and localization in untrimmed videos have attracted intensive research attention. Knowledge transfer by leveraging the publicly available trimmed videos as external guidance is a promising attempt to make up for the coarse-grained video-level annotation and improve the generalization performance. However, unconstrained knowledge transfer may bring about irrelevant noise and jeopardize the learning model. This article proposes a novel adaptability decomposing encoder-decoder network to transfer reliable knowledge between the trimmed and untrimmed videos for action recognition and localization by bidirectional point process modeling, given only video-level annotations. By decomposing the original features into the domain-adaptable and domain-specific ones based on their adaptability, trimmed-untrimmed knowledge transfer can be safely confined within a more coherent subspace. An encoder-decoder-based structure is carefully designed and jointly optimized to facilitate effective action classification and temporal localization. Extensive experiments are conducted on two benchmark data sets (i.e., THUMOS14 and ActivityNet1.3), and the experimental results clearly corroborate the efficacy of our method.

KW - Action recognition

KW - encodera-decoder

KW - knowledge transfer

KW - point process

KW - temporal action localization

UR - http://www.scopus.com/inward/record.url?scp=85079467157&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2019.2962815

DO - 10.1109/TNNLS.2019.2962815

M3 - Article

C2 - 31995502

AN - SCOPUS:85079467157

SN - 2162-237X

VL - 34

SP - 1852

EP - 1863

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

IS - 4

ER -

AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization

摘要

访问文件

其它文件与链接

指纹

引用此