
The Library
Source code plagiarism detection in academia with information retrieval : dataset and the observation
Tools
Karnalim, Oscar, Budi, Setia, Toba, H and Joy, Mike (2019) Source code plagiarism detection in academia with information retrieval : dataset and the observation. Informatics in Education, 18 (2). pp. 321-344. doi:10.15388/infedu.2019.15
|
PDF
WRAP-source-code-plagiarism-detection-academia-information-retrieval-Joy-2019.pdf - Published Version - Requires a PDF viewer. Available under License Creative Commons: Attribution-Share Alike 4.0. Download (1911Kb) | Preview |
|
![]() |
PDF
WRAP-source-code-plagiarism-detection-academia-information-retrieval-dataset-observation-Joy-2019.pdf - Accepted Version Embargoed item. Restricted access to Repository staff only - Requires a PDF viewer. Download (550Kb) |
Official URL: http://www.doi.org/10.15388/infedu.2019.15
Abstract
Source code plagiarism is an emerging issue in computer science education. As a result, a number of techniques have been proposed to handle this issue. However, comparing these techniques may be challenging, since they are evaluated with their own private dataset(s). This paper contributes in providing a public dataset for comparing these techniques. Specifically, the dataset is designed for evaluation with an Information Retrieval (IR) perspective. The dataset consists of 467 source code files, covering seven introductory programming assessment tasks. Unique to this dataset, both intention to plagiarise and advanced plagiarism attacks are considered in its construction. The dataset's characteristics were observed by comparing three IR-based detection techniques, and it is clear that most IR-based techniques are less effective than a baseline technique which relies on Running-Karp-Rabin Greedy-String-Tiling, even though some of them are far more time-efficient.
Item Type: | Journal Article | ||||||
---|---|---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||||
Divisions: | Faculty of Science, Engineering and Medicine > Science > Computer Science | ||||||
Library of Congress Subject Headings (LCSH): | Data sets, Source code (Computer science), Plagiarism -- Software | ||||||
Journal or Publication Title: | Informatics in Education | ||||||
Publisher: | Vilnius University Institute of Data Science and Digital Technologies | ||||||
ISBN: | 1648-5831 | ||||||
Official Date: | 2019 | ||||||
Dates: |
|
||||||
Volume: | 18 | ||||||
Number: | 2 | ||||||
Page Range: | pp. 321-344 | ||||||
DOI: | 10.15388/infedu.2019.15 | ||||||
Status: | Peer Reviewed | ||||||
Publication Status: | Published | ||||||
Access rights to Published version: | Open Access (Creative Commons) | ||||||
Date of first compliant deposit: | 23 October 2019 | ||||||
Date of first compliant Open Access: | 6 November 2019 | ||||||
Related URLs: | |||||||
Open Access Version: |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |
Downloads
Downloads per month over past year