
The Library
An approach to source-code plagiarism detection investigation using latent semantic analysis
Tools
Cosma, Georgina (2008) An approach to source-code plagiarism detection investigation using latent semantic analysis. PhD thesis, University of Warwick.
![]() |
PDF
WRAP_THESIS_Cosma_2008.pdf - Requires a PDF viewer. Download (1663Kb) |
![]() |
Other (Permission e-mail)
FW__Digitisation_of_your_PhD_Thesis_.msg Embargoed item. Restricted access to Repository staff only Download (1679Kb) |
Official URL: http://webcat.warwick.ac.uk/record=b2248352~S15
Abstract
This thesis looks at three aspects of source-code plagiarism. The first aspect of the
thesis is concerned with creating a definition of source-code plagiarism; the second aspect
is concerned with describing the findings gathered from investigating the Latent Semantic
Analysis information retrieval algorithm for source-code similarity detection; and the final
aspect of the thesis is concerned with the proposal and evaluation of a new algorithm that
combines Latent Semantic Analysis with plagiarism detection tools.
A recent review of the literature revealed that there is no commonly agreed definition of
what constitutes source-code plagiarism in the context of student assignments. This thesis
first analyses the findings from a survey carried out to gather an insight into the perspectives
of UK Higher Education academics who teach programming on computing courses. Based
on the survey findings, a detailed definition of source-code plagiarism is proposed.
Secondly, the thesis investigates the application of an information retrieval technique,
Latent Semantic Analysis, to derive semantic information from source-code files. Various
parameters drive the effectiveness of Latent Semantic Analysis. The performance of Latent
Semantic Analysis using various parameter settings and its effectiveness in retrieving
similar source-code files when optimising those parameters are evaluated.
Finally, an algorithm for combining Latent Semantic Analysis with plagiarism detection
tools is proposed and a tool is created and evaluated. The proposed tool, PlaGate, is
a hybrid model that allows for the integration of Latent Semantic Analysis with plagiarism
detection tools in order to enhance plagiarism detection. In addition, PlaGate has a facility
for investigating the importance of source-code fragments with regards to their contribution
towards proving plagiarism. PlaGate provides graphical output that indicates the clusters of
suspicious files and source-code fragments.
Item Type: | Thesis (PhD) | ||||
---|---|---|---|---|---|
Subjects: | Q Science > QA Mathematics > QA76 Electronic computers. Computer science. Computer software | ||||
Library of Congress Subject Headings (LCSH): | Plagiarism -- Computer programs, Source code (Computer science), Latent semantic indexing | ||||
Official Date: | July 2008 | ||||
Dates: |
|
||||
Institution: | University of Warwick | ||||
Theses Department: | Department of Computer Science | ||||
Thesis Type: | PhD | ||||
Publication Status: | Unpublished | ||||
Supervisor(s)/Advisor: | Joy, Mike ; Russ, Steve | ||||
Extent: | xix, 289 leaves : ill., charts | ||||
Language: | eng |
Request changes or add full text files to a record
Repository staff actions (login required)
![]() |
View Item |