基于大语言模型的长方法分解

Translated title of the contribution: Large-language-model-based Decomposition of Long Methods

Zi Mao Xu, Yan Jie Jiang*, Yu Xia Zhang, Hui Liu

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Long methods, along with other types of code smells, prevent software applications from reaching their optimal readability, reusability, and maintainability. Consequently, automated detection and decomposition of long methods have been widely studied. Although these approaches have significantly facilitated the decomposition, their solutions often differ significantly from the optimal ones. To address this, the automatable portion of the publicly available dataset containing real-world long methods is investigated. Based on the findings of this investigation, a new method (called Lsplitter) based on large language models (LLMs) is proposed in this study for automatically decomposing long methods. For a given long method, the Lsplitter decomposes the method into a series of shorter methods according to heuristic rules and LLMs. However, LLMs often split out similar methods. In response to the decomposition results of LLMs, Lsplitter utilizes a location-based algorithm to merge physically contiguous and highly similar methods into a longer method. Finally, these candidate results are ranked. Experiments are conducted on 2 849 long methods in real Java projects. The experimental results show that compared with the traditional methods combined with a modularity matrix, the hit rate of Lsplitter is improved by 142%, and compared with the methods purely based on LLMs, the hit rate is improved by 7.6%.

Translated title of the contributionLarge-language-model-based Decomposition of Long Methods
Original languageChinese (Traditional)
Pages (from-to)2501-2514
Number of pages14
JournalRuan Jian Xue Bao/Journal of Software
Volume36
Issue number6
DOIs
Publication statusPublished - 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'Large-language-model-based Decomposition of Long Methods'. Together they form a unique fingerprint.

Cite this