Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models

  • Anke Tang
  • , Li Shen*
  • , Yong Luo*
  • , Shuai Xie
  • , Han Hu
  • , Lefei Zhang
  • , Bo Du*
  • , Dacheng Tao
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Deep model training on extensive datasets is increasingly cost-prohibitive, prompting adoption of deep model fusion to leverage knowledge from pre-existing models. From weight averaging to more sophisticated methods, fusion effectively improves model performance and accelerates new model development. However, parameter interference between models and the lack of interpretability remain challenges. Existing methods address interference by evaluating parameters attributes, such as magnitude or sign, or by pruning. We begin by examining the fine-tuning of linear layers through the lens of subspace analysis and define parameter interference as an optimization problem. Subsequently, we introduce an innovative approach called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which upscales source models into an MoE model without extra data or training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Additionally, the issue of parameter interference, which is intrinsically challenging in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across both image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to LLMs, highlighting the adaptability and scalability of SMILE. For full fine-tuned models, about 50% additional parameters can achieve around 98% -99% of the performance of eight individual fine-tuned ViT models, while for LoRA fine-tuned Flan-T5 models, maintaining 99% performance with only 2% extra parameters.

Original languageEnglish
Pages (from-to)1145-1157
Number of pages13
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume48
Issue number2
DOIs
Publication statusPublished - 9 Jan 2026
Externally publishedYes

Keywords

  • Large Language Model
  • Mixture of Experts
  • Model Fusion
  • Subspace Decomposition

Fingerprint

Dive into the research topics of 'Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models'. Together they form a unique fingerprint.

Cite this