TY - JOUR
T1 - An Automated Approach to Discovering Software Refactorings by Comparing Successive Versions
AU - Liu, Bo
AU - Liu, Hui
AU - Niu, Nan
AU - Zhang, Yuxia
AU - Li, Guangjie
AU - Jiang, He
AU - Jiang, Yanjie
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called REEXTRACTOR+. At the heart of REEXTRACTOR+ is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated REEXTRACTOR+ on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that REEXTRACTOR+ significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 59.6% and improving recall by 19.2%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.
AB - Software developers and maintainers frequently conduct software refactorings to improve software quality. Identifying the conducted software refactorings may significantly facilitate the comprehension of software evolution, and thus facilitate software maintenance and evolution. Besides that, the identified refactorings are also valuable for data-driven approaches in software refactoring. To this end, researchers have proposed a few approaches to identifying software refactorings automatically. However, the performance (especially precision) of such approaches deserves substantial improvement. To this end, in this paper, we propose a novel refactoring detection approach, called REEXTRACTOR+. At the heart of REEXTRACTOR+ is a reference-based entity matching algorithm that matches coarse-grained code entities (e.g., classes and methods) between two successive versions, and a context-aware statement matching algorithm that matches statements within a pair of matched methods. We evaluated REEXTRACTOR+ on a benchmark consisting of 400 commits from 20 real-world projects. The evaluation results suggested that REEXTRACTOR+ significantly outperformed the state of the art in refactoring detection, reducing the number of false positives by 59.6% and improving recall by 19.2%. We also evaluated the performance of the proposed matching algorithms that serve as the cornerstone of refactoring detection. The evaluation results suggested that the proposed algorithms excel in matching code entities, substantially reducing the number of mistakes (false positives plus false negatives) by 67% compared to the state-of-the-art approaches.
KW - Detection
KW - Entity Matching
KW - History
KW - Software Evolution
KW - Software Refactoring
UR - http://www.scopus.com/inward/record.url?scp=85216890381&partnerID=8YFLogxK
U2 - 10.1109/TSE.2025.3534239
DO - 10.1109/TSE.2025.3534239
M3 - Article
AN - SCOPUS:85216890381
SN - 0098-5589
JO - IEEE Transactions on Software Engineering
JF - IEEE Transactions on Software Engineering
ER -