Universiti Teknologi Malaysia Institutional Repository

Improved semantic graph-based plagiarism detection

Osman Ahmed, Ahmed Hamza (2013) Improved semantic graph-based plagiarism detection. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computing.


Official URL: http://dms.library.utm.my:8080/vital/access/manage...


Plagiarism detection occurs when the content of a text is copied without permission or citation. Nowadays, many text documents on the internet are easily copied and accessed. This study proposed improved methods to handle plagiarism. The proposed plagiarism detection methods are developed using graph-based representation and semantic role labeling which are improved using fuzzy logic technique and chi-squared automatic interaction detection. The graph-based method does not only represent the content of a text document as a graph, but also captures the underlying semantic meaning in terms of the relationships among its concepts. Semantic role labeling is superior in generating semantic arguments for each sentence. This semantic role labeling plays an important part in plagiarism detection as it segments the role of concepts in documents to labels which are compared and used to detect plagiarism. Scoring for each argument generated by the fuzzy logic method to select important arguments is also another feature of this study. Chisquared Automatic Interaction Detection technique was applied to enforce the results obtained from the fuzzy logic and semantic role labeling by selecting important arguments from the sentences. It is concluded that not all arguments in the text are useful in the plagiarism detection process. Therefore, only the most important arguments were selected by the fuzzy logic and Chi-squared automatic interaction detection, and the results were used in the similarity calculation process. Experiments were tested on the PAN-PC-2009 for standard artificial simulation corpus and the Short Answers Questions (CS11) for human simulation corpus in plagiarism detection. The proposed methods detected many types of plagiarisms, such as copy paste plagiarism, rewording or synonym replacement, changing of word structure in the sentences, modifying the sentence from passive voice to active voice and vice-versa. Results from the experiments using the proposed methods in comparison to other palagiarism detection techniques (Fuzzy Semantic-Based String Similarity and Longest Common Subsequence) achieved better performance in terms of recall (93%), precision (90%) and f-measure (91%).

Item Type:Thesis (PhD)
Additional Information:Thesis (Ph.D (Sains Komputer)) - Universiti Teknologi Malaysia, 2013; Supervisor : Prof. Dr. Naomie Salim
Uncontrolled Keywords:plagiarism, semantic computing
Subjects:P Language and Literature > PN Literature (General)
ID Code:33795
Deposited By: Kamariah Mohamed Jong
Deposited On:28 Nov 2013 18:47
Last Modified:23 Jul 2017 12:27

Repository Staff Only: item control page