Region-adaptive Concept Aggregation for Few-shot Visual Recognition

Mengya Han, Yibing Zhan, Baosheng Yu, Yong Luo*, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Few-shot learning (FSL) aims to learn novel concepts from very limited examples. However, most FSL methods suffer from the issue of lacking robustness in concept learning. Specifically, existing FSL methods usually ignore the diversity of region contents that may contain concept-irrelevant information such as the background, which would introduce bias/noise and degrade the performance of conceptual representation learning. To address the above-mentioned issue, we propose a novel metric-based FSL method termed region-adaptive concept aggregation network or RCA-Net. Specifically, we devise a region-adaptive concept aggregator (RCA) to model the relationships of different regions and capture the conceptual information in different regions, which are then integrated in a weighted average manner to obtain the conceptual representation. Consequently, robust concept learning can be achieved by focusing more on the concept-relevant information and less on the conceptual-irrelevant information. We perform extensive experiments on three popular visual recognition benchmarks to demonstrate the superiority of RCA-Net for robust few-shot learning. In particular, on the Caltech-UCSD Birds-200-2011 (CUB200) dataset, the proposed RCA-Net significantly improves 1-shot accuracy from 74.76% to 78.03% and 5-shot accuracy from 86.84% to 89.83% compared with the most competitive counterpart.

Original languageEnglish
Pages (from-to)554-568
Number of pages15
JournalMachine Intelligence Research
Volume20
Issue number4
DOIs
Publication statusPublished - Aug 2023

Keywords

  • Few-shot learning
  • concept learning
  • concept-aggregation
  • metric-based meta learning
  • region-adaptive

Fingerprint

Dive into the research topics of 'Region-adaptive Concept Aggregation for Few-shot Visual Recognition'. Together they form a unique fingerprint.

Cite this