TY - GEN
T1 - Instruction Fine-Tuning Guidance
T2 - 1st International Conference on Intelligent Multilingual Information Processing, IMLIP 2024
AU - Jiang, Linjia
AU - Shi, Shumin
AU - Yang, Cheng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - In this paper, we investigate the impact of different Parameter Efficient Fine-tuning (PEFT) methods on the generation performance of large language models, focusing on how different methods affect the model through various attributes. We evaluate five well-known PEFT methods: LoRA, AdaLoRA, LoHA, IA3, and P-Tuning taking computational resource, scalability, efficiency, and other considerations into consideration. We use the Qwen-7B-Chat model as a baseline, instruction fine-tune it using the alpaca-cleaned dataset, and use the MT-Bench and HelpSteer datasets to evaluate the model by several metrics, including helpfulness, correctness, consistency, complexity and verbosity. Experimental results show that while all methods enhance generation performance of the model, they exhibit different advantages and trade-offs in terms of text quality, diversity, and computational efficiency. Ablation studies further explore the impact of the number of training parameters and target modules on model generation performance, and find that reducing the number of them can still lead to improved performance. Our analysis provides valuable guidance for selecting the most appropriate fine-tuning methods based on specific task requirements, and enlightening thoughts on how these methods affect language model generation after instruction fine-tuning.
AB - In this paper, we investigate the impact of different Parameter Efficient Fine-tuning (PEFT) methods on the generation performance of large language models, focusing on how different methods affect the model through various attributes. We evaluate five well-known PEFT methods: LoRA, AdaLoRA, LoHA, IA3, and P-Tuning taking computational resource, scalability, efficiency, and other considerations into consideration. We use the Qwen-7B-Chat model as a baseline, instruction fine-tune it using the alpaca-cleaned dataset, and use the MT-Bench and HelpSteer datasets to evaluate the model by several metrics, including helpfulness, correctness, consistency, complexity and verbosity. Experimental results show that while all methods enhance generation performance of the model, they exhibit different advantages and trade-offs in terms of text quality, diversity, and computational efficiency. Ablation studies further explore the impact of the number of training parameters and target modules on model generation performance, and find that reducing the number of them can still lead to improved performance. Our analysis provides valuable guidance for selecting the most appropriate fine-tuning methods based on specific task requirements, and enlightening thoughts on how these methods affect language model generation after instruction fine-tuning.
KW - Instruction Tuning
KW - Language Models
KW - Parameter Efficient Fine-tuning
UR - http://www.scopus.com/inward/record.url?scp=105008368672&partnerID=8YFLogxK
U2 - 10.1007/978-981-96-5123-8_19
DO - 10.1007/978-981-96-5123-8_19
M3 - Conference contribution
AN - SCOPUS:105008368672
SN - 9789819651221
T3 - Communications in Computer and Information Science
SP - 277
EP - 292
BT - Intelligent Multilingual Information Processing - 1st International Conference, IMLIP 2024, Proceedings
A2 - Zhang, Huaping
A2 - Shang, Jianyun
A2 - Su, Jinsong
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 16 November 2024 through 17 November 2024
ER -