Universiti Teknologi Malaysia Institutional Repository

Document plagiarism detection algorithm using semantic networks

Ahmed Muftah, Ahmed Jabr (2009) Document plagiarism detection algorithm using semantic networks. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.

[img] PDF - Submitted Version
Restricted to Repository staff only

1230Kb
[img] PDF
16Kb
[img] PDF
84Kb
[img] PDF
63Kb

Abstract

The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2009; Supervisor : Assoc. Prof. Dr. Naomie
Uncontrolled Keywords:world wide web (WWW), plagiarism, algorithm, semantic networks
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:11433
Deposited By: Ms Zalinda Shuratman
Deposited On:15 Dec 2010 07:38
Last Modified:19 Jul 2012 01:09

Repository Staff Only: item control page