Detecting Copy Directions among Programs Using Extreme Learning Machines

Bin Wang, Xiaochun Yang*, Guoren Wang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Because of the complexity of software development, some software developers may plagiarize source code from other projects or open source software in order to shorten development cycle. Many methods have been proposed to detect plagiarism among programs based on the program dependence graph, a graph representation of a program. However, to our best knowledge, existing works only detect similarity between programs without detecting copy direction among them. By employing extreme learning machine (ELM), we construct feature space for describing features of every two programs with possible plagiarism relationship. Such feature space could be large and time consuming, so we propose approaches to construct a small feature space by pruning isolated control statements and removable statements from each program to accelerate both training and classification time. We also analyze the features of data dependencies between any original program and its copy program, and based on it we propose a feedback framework to find a good feature space that can achieve both accuracy and efficiency. We conducted a thorough experimental study of this technique on real C programs collected from the Internet. The experimental results show the high accuracy and efficiency of our ELM-based approaches.

Original languageEnglish
Article number793697
JournalMathematical Problems in Engineering
Volume2015
DOIs
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Dive into the research topics of 'Detecting Copy Directions among Programs Using Extreme Learning Machines'. Together they form a unique fingerprint.

Cite this