Abstract
Deep model training on extensive datasets is increasingly cost-prohibitive, prompting adoption of deep model fusion to leverage knowledge from pre-existing models. From weight averaging to more sophisticated methods, fusion effectively improves model performance and accelerates new model development. However, parameter interference between models and the lack of interpretability remain challenges. Existing methods address interference by evaluating parameters attributes, such as magnitude or sign, or by pruning. We begin by examining the fine-tuning of linear layers through the lens of subspace analysis and define parameter interference as an optimization problem. Subsequently, we introduce an innovative approach called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction, which upscales source models into an MoE model without extra data or training. Our approach relies on the observation that fine-tuning mostly keeps the important parts from the pre-training, but it uses less significant or unused areas to adapt to new tasks. Additionally, the issue of parameter interference, which is intrinsically challenging in the original parameter space, can be managed by expanding the dimensions. We conduct extensive experiments across both image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning, and we apply our method to LLMs, highlighting the adaptability and scalability of SMILE. For full fine-tuned models, about 50% additional parameters can achieve around 98% -99% of the performance of eight individual fine-tuned ViT models, while for LoRA fine-tuned Flan-T5 models, maintaining 99% performance with only 2% extra parameters.
| Original language | English |
|---|---|
| Pages (from-to) | 1145-1157 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
| Volume | 48 |
| Issue number | 2 |
| DOIs | |
| Publication status | Published - 9 Jan 2026 |
| Externally published | Yes |
Keywords
- Large Language Model
- Mixture of Experts
- Model Fusion
- Subspace Decomposition
Fingerprint
Dive into the research topics of 'Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver