Uni-IL: Unified Incremental Learning of Vision-Language Models via Mixture of Attribute-Guided Experts

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the advent of parameter-efficient fine-tuning techniques for pre-trained vision-language models, interest in adapting them for various incremental learning scenarios has grown, i.e., sequential increments on task, class, and domain. However, no high-performance incremental learning framework has integrated these three incremental scenarios to achieve Unified Incremental Learning (Uni-IL) in complex settings. In this work, we propose an incremental learning framework called Mixture of Attribute-Guided Experts (MAGE) to alleviate the long-term forgetting in vision-language model incremental learning. Our approach involves acquiring image attribute knowledge via LLMs to form an attribute pool. We match the most relevant attributes as inputs to the Mixture of Experts (MoE) to fine-tune the pre-trained CLIP. Then the expert routers learn to select specific expert combinations based on the data and attribute features, alleviating catastrophic forgetting. The attribute pool incorporates both domain and class knowledge, enabling our approach to adapt to the three types of incremental learning scenarios and thus facilitating unified incremental learning. Through extensive experiments on our newly proposed benchmark and existing incremental learning scenarios, the results demonstrate that our proposed method not only performs well on the new Uni-IL tasks but also consistently outperforms previous state-of-the-art methods. Source code is available at https://github.com/ElectricField/Uni-IL.

Original languageEnglish
Title of host publicationProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
EditorsTat-Seng Chua, Lai-Kuan Wong, Chee Seng Chan, Jinhui Tang, Chong-Wah Ngo, Klaus Schoeffmann, Jiaying Liu, Yo-Sung Ho
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400720055
DOIs
Publication statusPublished - 6 Dec 2025
Event7th ACM International Conference on Multimedia in Asia, MMAsia 2025 - Kuala Lumpur, Malaysia
Duration: 9 Dec 202512 Dec 2025

Publication series

NameProceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025

Conference

Conference7th ACM International Conference on Multimedia in Asia, MMAsia 2025
Country/TerritoryMalaysia
CityKuala Lumpur
Period9/12/2512/12/25

Keywords

  • Attribute-guided learning
  • Incremental learning
  • Mixture of experts
  • Vision-language models

Fingerprint

Dive into the research topics of 'Uni-IL: Unified Incremental Learning of Vision-Language Models via Mixture of Attribute-Guided Experts'. Together they form a unique fingerprint.

Cite this