Data-Adaptive Weight-Ensembling for Multi-task Model Fusion

Anke Tang, Li Shen*, Yong Luo*, Shiwei Liu, Han Hu, Bo Du, Dacheng Tao

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Creating a multi-task model by merging models for distinct tasks has proven to be an economical and scalable approach. Recent research, like task arithmetic, demonstrates that a static solution for multi-task model fusion can be located within the vector space spanned by task vectors. However, the static nature of these methods limits their ability to adapt to the intricacies of individual instances, thereby hindering their performance in complex scenarios. To overcome this limitation, we propose a data-adaptive weight-ensembling approach that generates model weights in time. Specifically, we first feed the input samples into a hypernetwork to generate instance-specific weights for the primary model. Subsequently, we perform a functional call on the primary large model with the instance-specific weights. By generating model weights in time, the unified model gains increased flexibility and can resolve potential weight conflicts between tasks. Building upon this adaptability, our method necessitates solely the model checkpoints and unlabeled test samples using test-time adaptation training. We primarily conduct extensive experiments on vision Transformers and Flan-T5 models, demonstrating superior performance and satisfactory zero-shot transferability.

Original languageEnglish
JournalInternational Journal of Computer Vision
DOIs
Publication statusAccepted/In press - 2025
Externally publishedYes

Keywords

  • Dictionary learning
  • Hypernetwork
  • Knowledge transfer
  • Model fusion

Fingerprint

Dive into the research topics of 'Data-Adaptive Weight-Ensembling for Multi-task Model Fusion'. Together they form a unique fingerprint.

Cite this