TY - GEN
T1 - Uni-IL
T2 - 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
AU - Meng, Fanyu
AU - Zhan, Yufeng
AU - Zhang, Jie
AU - Wang, Zhiyuan
AU - Xia, Yuanqing
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/12/6
Y1 - 2025/12/6
N2 - With the advent of parameter-efficient fine-tuning techniques for pre-trained vision-language models, interest in adapting them for various incremental learning scenarios has grown, i.e., sequential increments on task, class, and domain. However, no high-performance incremental learning framework has integrated these three incremental scenarios to achieve Unified Incremental Learning (Uni-IL) in complex settings. In this work, we propose an incremental learning framework called Mixture of Attribute-Guided Experts (MAGE) to alleviate the long-term forgetting in vision-language model incremental learning. Our approach involves acquiring image attribute knowledge via LLMs to form an attribute pool. We match the most relevant attributes as inputs to the Mixture of Experts (MoE) to fine-tune the pre-trained CLIP. Then the expert routers learn to select specific expert combinations based on the data and attribute features, alleviating catastrophic forgetting. The attribute pool incorporates both domain and class knowledge, enabling our approach to adapt to the three types of incremental learning scenarios and thus facilitating unified incremental learning. Through extensive experiments on our newly proposed benchmark and existing incremental learning scenarios, the results demonstrate that our proposed method not only performs well on the new Uni-IL tasks but also consistently outperforms previous state-of-the-art methods. Source code is available at https://github.com/ElectricField/Uni-IL.
AB - With the advent of parameter-efficient fine-tuning techniques for pre-trained vision-language models, interest in adapting them for various incremental learning scenarios has grown, i.e., sequential increments on task, class, and domain. However, no high-performance incremental learning framework has integrated these three incremental scenarios to achieve Unified Incremental Learning (Uni-IL) in complex settings. In this work, we propose an incremental learning framework called Mixture of Attribute-Guided Experts (MAGE) to alleviate the long-term forgetting in vision-language model incremental learning. Our approach involves acquiring image attribute knowledge via LLMs to form an attribute pool. We match the most relevant attributes as inputs to the Mixture of Experts (MoE) to fine-tune the pre-trained CLIP. Then the expert routers learn to select specific expert combinations based on the data and attribute features, alleviating catastrophic forgetting. The attribute pool incorporates both domain and class knowledge, enabling our approach to adapt to the three types of incremental learning scenarios and thus facilitating unified incremental learning. Through extensive experiments on our newly proposed benchmark and existing incremental learning scenarios, the results demonstrate that our proposed method not only performs well on the new Uni-IL tasks but also consistently outperforms previous state-of-the-art methods. Source code is available at https://github.com/ElectricField/Uni-IL.
KW - Attribute-guided learning
KW - Incremental learning
KW - Mixture of experts
KW - Vision-language models
UR - https://www.scopus.com/pages/publications/105025134347
U2 - 10.1145/3743093.3771068
DO - 10.1145/3743093.3771068
M3 - Conference contribution
AN - SCOPUS:105025134347
T3 - Proceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
BT - Proceedings of the 7th ACM International Conference on Multimedia in Asia, MMAsia 2025
A2 - Chua, Tat-Seng
A2 - Wong, Lai-Kuan
A2 - Chan, Chee Seng
A2 - Tang, Jinhui
A2 - Ngo, Chong-Wah
A2 - Schoeffmann, Klaus
A2 - Liu, Jiaying
A2 - Ho, Yo-Sung
PB - Association for Computing Machinery, Inc
Y2 - 9 December 2025 through 12 December 2025
ER -