Authorship identification of source codes

Chunxia Zhang*, Sen Wang, Jiayu Wu, Zhendong Niu

*此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

18 引用 (Scopus)

摘要

Source code authorship identification is an issue of authorship identification from documents, and it is to identify authors of source codes or programs based on source code examples of programmers. The main applications of authorship identification of source codes include software intellectual property infringement, malicious code detection and software maintenance and update. This paper proposes an approach of constructing author profiles of programmers based on a logic model of continuous word-level n-gram and discrete word-level n-gram, and a multi-level context model about operations, loops, arrays and methods. Further, we employ the technique of sequential minimal optimization for support vector machine training to identify authorship of source codes. The advantage of author profiles in this paper can discover explicit and implicit personal programming preference patterns of and between keywords, identifiers, operators, statements, methods and classes. Experimental results on programs from two open source websites demonstrate that our approach achieves a high accuracy and outperforms the baseline methods.

源语言英语
主期刊名Web and Big Data - 1st International Joint Conference, APWeb-WAIM 2017, Proceedings
编辑Cyrus Shahabi, Xiang Lian, Christian S. Jensen, Xiaochun Yang, Lei Chen
出版商Springer Verlag
282-296
页数15
ISBN(印刷版)9783319635781
DOI
出版状态已出版 - 2017
活动1st Asia-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2017 - Beijing, 中国
期限: 7 7月 20179 7月 2017

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
10366 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议1st Asia-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data, APWeb-WAIM 2017
国家/地区中国
Beijing
时期7/07/179/07/17

指纹

探究 'Authorship identification of source codes' 的科研主题。它们共同构成独一无二的指纹。

引用此