Targeted Low-rank Refinement: Enhancing Sparse Language Models with Precision

  • Li Shen
  • , Anke Tang*
  • , Yong Luo*
  • , Tao Sun
  • , Han Hu
  • , Xiaochun Cao
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Pruning is a widely used technique for compressing large neural networks that eliminates weights with minimal impact on performance. Current pruning methods, exemplified by magnitude pruning, assign importance scores to weights based on their magnitude and remove those below a certain threshold. However, these methods introduce a gap between the original dense and pruned sparse models, potentially impairing performance, especially at high sparsity ratios. To address this issue, we introduce a method that bridges this gap through low-rank approximation of the difference between dense and sparse matrices. Our approach iteratively refines the sparse weight matrix with a low-rank adjustment, capturing essential information typically lost during pruning. We provide a comprehensive theoretical analysis of our method, establishing its convergence properties and efficacy. Experimental results on LLaMA models validate our method’s effectiveness across various pruning techniques and sparsity levels. At 50% sparsity, it reduces perplexity by 53.9% compared to conventional magnitude pruning on LLaMA-7B. Furthermore, our approach enables an 8.6% reduction in model parameters while maintaining a sparsity ratio of about 50%.

Original languageEnglish
Pages (from-to)54457-54475
Number of pages19
JournalProceedings of Machine Learning Research
Volume267
Publication statusPublished - 2025
Externally publishedYes
Event42nd International Conference on Machine Learning, ICML 2025 - Vancouver, Canada
Duration: 13 Jul 202519 Jul 2025

Fingerprint

Dive into the research topics of 'Targeted Low-rank Refinement: Enhancing Sparse Language Models with Precision'. Together they form a unique fingerprint.

Cite this