TY - JOUR
T1 - Deep Model Fusion
T2 - A Survey
AU - Li, Weishi
AU - Peng, Yong
AU - Zhang, Miao
AU - Ding, Liang
AU - Hu, Han
AU - Shen, Li
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Deep model fusion/merging is an emerging technique that integrates parameters or predictions from multiple deep learning (DL) models into a unified framework. It combines the abilities of different models to compensate for the biases and errors of an individual model, improving overall performance. However, deep model fusion, especially on large-scale DL models such as large language models (LLMs) and foundation models, faces several challenges, including high computational cost and interference between different heterogeneous models. In order to understand it better, we present a comprehensive survey to summarize the recent progress. We categorize existing model fusion methods as fourfold: 1) weight average (WA) averages the parameters of multiple models to obtain results closer to the optimal solution; 2) considering that direct averaging of models often yields suboptimal results, 'mode connectivity' connects networks via paths of nonincreasing loss in weight spaces before the fusion. Along these paths, initial models are transformed into forms with consistent functions and better fusion effects; 3) similarly, for models with poor direct fusion results, 'alignment' matches the corresponding units and merges these models, thus fully exploiting the corresponding relationships between the models; and 4) in addition to the above-mentioned methods of parameter fusion, 'ensemble learning' fuses the outputs of multiple models in the inference stage to improve the accuracy and robustness of networks. In addition, we analyze the challenges of deep model fusion and illuminate the possible research directions in the future.
AB - Deep model fusion/merging is an emerging technique that integrates parameters or predictions from multiple deep learning (DL) models into a unified framework. It combines the abilities of different models to compensate for the biases and errors of an individual model, improving overall performance. However, deep model fusion, especially on large-scale DL models such as large language models (LLMs) and foundation models, faces several challenges, including high computational cost and interference between different heterogeneous models. In order to understand it better, we present a comprehensive survey to summarize the recent progress. We categorize existing model fusion methods as fourfold: 1) weight average (WA) averages the parameters of multiple models to obtain results closer to the optimal solution; 2) considering that direct averaging of models often yields suboptimal results, 'mode connectivity' connects networks via paths of nonincreasing loss in weight spaces before the fusion. Along these paths, initial models are transformed into forms with consistent functions and better fusion effects; 3) similarly, for models with poor direct fusion results, 'alignment' matches the corresponding units and merges these models, thus fully exploiting the corresponding relationships between the models; and 4) in addition to the above-mentioned methods of parameter fusion, 'ensemble learning' fuses the outputs of multiple models in the inference stage to improve the accuracy and robustness of networks. In addition, we analyze the challenges of deep model fusion and illuminate the possible research directions in the future.
KW - Deep learning (DL)
KW - federated learning (FL)
KW - large language models (LLMs)
KW - model aggregation
KW - model fusion
KW - survey
UR - https://www.scopus.com/pages/publications/105023161830
U2 - 10.1109/TNNLS.2025.3628666
DO - 10.1109/TNNLS.2025.3628666
M3 - Article
AN - SCOPUS:105023161830
SN - 2162-237X
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
ER -