TY - GEN
T1 - JLeaks
T2 - 44th ACM/IEEE International Conference on Software Engineering, ICSE 2024
AU - Liu, Tianyang
AU - Ji, Weixing
AU - Dong, Xiaohui
AU - Yao, Wuhuang
AU - Wang, Yizhuo
AU - Liu, Hui
AU - Peng, Haiyang
AU - Wang, Yuxuan
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024
Y1 - 2024
N2 - High-quality defect repositories are vital in defect detection, local-ization, and repair. However, existing repositories collected from open-source projects are either small-scale or inadequately labeled and packed. This paper systematically summarizes the program-ming APIs of system resources (i.e., file, socket, and thread) in Java. Additionally, this paper demonstrates the exceptions that may cause resource leaks in the chained and nested streaming operations. A semi-automatic toolchain is built to improve the efficiency of de-fect extraction, including automatic building for large legacy Java projects. Accordingly, 1,094 resource leaks were collected from 321 open-source projects on GitHub. This repository, named JLeaks, was built by round-by-round filtering and cross-validation, involving the review of approximately 3,185 commits from hundreds of projects. JLeaks is currently the largest resource leak repository, and each defect in JLeaks is well-labeled and packed, including causes, locations, patches, source files, and compiled bytecode files for 254 defects. We have conducted a detailed analysis of JLeaks for defect distribution, root causes, and fix approaches. We compare JLeaks with two well-known resource leak repositories, and the results show that JLeaks is more informative and complete, with high availability, uniqueness, and consistency. Additionally, we show the usability of JLeaks in two application scenarios. Future studies can leverage our repository to encourage better design and implementation of defect-related algorithms and tools.
AB - High-quality defect repositories are vital in defect detection, local-ization, and repair. However, existing repositories collected from open-source projects are either small-scale or inadequately labeled and packed. This paper systematically summarizes the program-ming APIs of system resources (i.e., file, socket, and thread) in Java. Additionally, this paper demonstrates the exceptions that may cause resource leaks in the chained and nested streaming operations. A semi-automatic toolchain is built to improve the efficiency of de-fect extraction, including automatic building for large legacy Java projects. Accordingly, 1,094 resource leaks were collected from 321 open-source projects on GitHub. This repository, named JLeaks, was built by round-by-round filtering and cross-validation, involving the review of approximately 3,185 commits from hundreds of projects. JLeaks is currently the largest resource leak repository, and each defect in JLeaks is well-labeled and packed, including causes, locations, patches, source files, and compiled bytecode files for 254 defects. We have conducted a detailed analysis of JLeaks for defect distribution, root causes, and fix approaches. We compare JLeaks with two well-known resource leak repositories, and the results show that JLeaks is more informative and complete, with high availability, uniqueness, and consistency. Additionally, we show the usability of JLeaks in two application scenarios. Future studies can leverage our repository to encourage better design and implementation of defect-related algorithms and tools.
KW - Defect repository
KW - Java language
KW - Open-source projects
KW - Resource leak
UR - http://www.scopus.com/inward/record.url?scp=85196819255&partnerID=8YFLogxK
U2 - 10.1145/3597503.3639162
DO - 10.1145/3597503.3639162
M3 - Conference contribution
AN - SCOPUS:85196819255
T3 - Proceedings - International Conference on Software Engineering
SP - 1723
EP - 1735
BT - Proceedings - 2024 ACM/IEEE 44th International Conference on Software Engineering, ICSE 2024
PB - IEEE Computer Society
Y2 - 14 April 2024 through 20 April 2024
ER -