AC: An Integrated Source Code Plagiarism Detection Environment

Authors: Manuel Freire, Manuel Cebrian, Emilio del Rosal
Comments: 57 pages, 11 figures
Subj-class: Information Theory

Plagiarism detection in programming assignments is still a very problematic issue, in terms of economic costs, conceptual controversy, legal risks, and detection algorithms and heuristics. In this paper, we present AC: an integrated environment for the study of plagiarism and a powerful tool for its detection. We explain the special design of AC, prepared for unlimited improvement and external contributions, and how it can be used for detecting plagiarism. In particular, the special visualization of the results that AC offers, together with the statistical analysis that it performs provide different useful heuristics to catch suspects. Besides, AC incorporates different measures of similarity: old and new developed by our group. Finally, we study the performance of AC in two practical examples of programming assignments submission.

Full-text: PDF, or Other formats

Plagiarism Detection Tools

Automatic Generation of Benchmarks for Plagiarism Detection Tools using Grammatical Evolution

Authors: Manuel Cebrian, Manuel Alfonseca, Alfonso Ortega
Comments: 8 pages, 9 figures. Extended version of the poster accepted in GECCO’07

Student plagiarism is a mayor problem in universities worldwide. In this paper, we focus on plagiarism in answers to computer programming assignments, where student mix and/or modify one or more original solutions to obtain counterfeits. Although several software tools have been implemented to help the tedious and time consuming task of detecting plagiarism, little has been done to assess their quality, because, in fact, determining the original subset of the whole solution set is practically impossible for graders. In this article we present a Grammatical Evolution technique (http://www.grammatical-evolution.org/ or http://www.grammaticalevolution.org/ )which generates benchmarks. Given a programming language, our technique generates a set of original solutions to an assignment, together with a set of plagiarisms of the former set which mimic the way in which students act. The phylogeny of the coded solutions is predefined, providing a base for the evaluation of the performance of copy-catching tools. We give empirical evidence of the suitability of our approach by studying the behavior of one state-of-the-art detection tool (AC) on four benchmarks coded in APL2, generated with this technique.

http://arxiv.org/abs/cs.NE/0703134