Skip to content Skip to navigation
University of Warwick
  • Study
  • |
  • Research
  • |
  • Business
  • |
  • Alumni
  • |
  • News
  • |
  • About

University of Warwick
Publications service & WRAP

Highlight your research

  • WRAP
    • Home
    • Search WRAP
    • Browse by Warwick Author
    • Browse WRAP by Year
    • Browse WRAP by Subject
    • Browse WRAP by Department
    • Browse WRAP by Funder
    • Browse Theses by Department
  • Publications Service
    • Home
    • Search Publications Service
    • Browse by Warwick Author
    • Browse Publications service by Year
    • Browse Publications service by Subject
    • Browse Publications service by Department
    • Browse Publications service by Funder
  • Help & Advice
University of Warwick

The Library

  • Login
  • Admin

Style analysis for source code plagiarism detection

Tools
- Tools
+ Tools

Mirza, O. M. (2018) Style analysis for source code plagiarism detection. PhD thesis, University of Warwick.

[img] PDF
WRAP_Theses_Mirza_2018.pdf - Submitted Version
Embargoed item. Restricted access to Repository staff only until 24 May 2021. Contact author directly, specifying your specific needs. - Requires a PDF viewer.

Download (7Mb)
Official URL: http://webcat.warwick.ac.uk/record=b3439454~S15

Request Changes to record.

Abstract

The enormous growth in the available online code resources has created new challenges for detecting plagiarism in source code of programs. Several software applications can detect source code similarity using different detection methods. However, few current detection tools detect every kind of detection plagiarism attack. The aim of this thesis is, therefore, to enhance methods for plagiarism detection in source code using a style analysis approach that has been used to detect authorship.

There are very few large source-code datasets which are suitable for research purposes, and two such datasets include the BlackBox dataset and the SOCO (Detection of SOurce COde) dataset. SOCO is a benchmark dataset that contains groups of similar source-code files that can be considered plagiarised and has been used in authorship and plagiarism detection competitions.

In the first part of the thesis, the suitability of BlackBox as source of datasets for testing plagiarism detection is explored. The files in BlackBox were analysed and visualised in order to evaluate its suitability as a dataset that can be used in this research. The analysis aimed to identify similar source code files, and therefore to detect groups of Java files within BlackBox that can be used for evaluating the performance of source-code plagiarism detection methods.

In the second part of the thesis, a plagiarism detection framework (\the Metric-File Matrix Framework (MFM)" is proposed. The MFM framework is designed to overcome some of the limitations of existing plagiarism detection methods by 1) proposing a new set of metrics which consider structural and stylistic similarities; and 2) by using Singular Value Decomposition as a technique to remove noise and to reduce the dimensionality of the data to enhance the similarity detection.

The MFM framework was implemented and its performance was evaluated using the proposed metrics. For the evaluations, the SOCO dataset was adopted and the performance of the proposed framework was compared against other state-of- the-art plagiarism detection tools including JPlag.

Item Type: Thesis or Dissertation (PhD)
Subjects: Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software
Library of Congress Subject Headings (LCSH): Source code (Computer science), Plagiarism -- Software, Data sets, Java (Computer program language)
Official Date: November 2018
Dates:
DateEvent
November 2018UNSPECIFIED
Institution: University of Warwick
Theses Department: Department of Computer Science
Thesis Type: PhD
Publication Status: Unpublished
Supervisor(s)/Advisor: Joy, Mike ; Cosma, Georgina
Format of File: pdf
Extent: xv, 160 leaves : illustrations, charts
Language: eng

Request changes or add full text files to a record

Repository staff actions (login required)

View Item View Item
twitter

Email us: wrap@warwick.ac.uk
Contact Details
About Us