Can Categories and Attributes Be Learned in a Multi-Task Way?

  • Shu Yang
  • , Yaowei Wang
  • , Yemin Shi
  • , Zesong Fei*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Intuitively, we can think of object recognition and attribute prediction as correlated tasks. However, they appeared to conflict in a simple two-branch multi-task framework (a category branch and an attribute branch) with a shared backbone part (convolutional layers and pooling layers). The performance dropped along with the iterative training steps. This result might have been caused by the noncoherent feature distribution between the object recognition features and the attribute prediction features. Recognition features are discriminative for different categories and are not sensitive to intracategory variations, while attribute prediction features are discriminative for different attributes, although these attributes can exist in objects from the same category. Thus, a conflict occurs when we force the network to learn the two kinds of distinct features simultaneously. To address this problem, we propose the category and attribute prediction network (CAP-net), in which a category-constrained attribute prediction structure is introduced to transfer the object recognition knowledge and avoid the conflict between two features. The CAP-net parameters can be learned easily with a regularization method. Extensive experimental results show that the CAP-net outperforms the state-of-the-art methods on object recognition and attribute prediction tasks.

Original languageEnglish
Article number8723592
Pages (from-to)3194-3204
Number of pages11
JournalIEEE Transactions on Multimedia
Volume21
Issue number12
DOIs
Publication statusPublished - Dec 2019

Keywords

  • Multi-task learning
  • category-constrained attribute prediction
  • regularization

Fingerprint

Dive into the research topics of 'Can Categories and Attributes Be Learned in a Multi-Task Way?'. Together they form a unique fingerprint.

Cite this