Universiti Teknologi Malaysia Institutional Repository

Bar chart plagiarism detection

Mohammed Salih, Mohammed Mumtaz (2013) Bar chart plagiarism detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.



Plagiarism can be considered one of the electronic crimes and intellectual thefts, which has become one of educational challenges of research institutions. One form to represent quantitative information is charts such as line and bar chart, which can formulate the information in info-graphic form. The extraction of features of bar chart is an essential process to get the data from images. Some techniques presented by researchers focused on the graphical part rather than text itself, such as Hough Transform and Learning Based method. In this study, ten features of bar chart images are utilized to detect and find the proportion of similarity between the charts. Some of these features can be directly extracted by OCR, while others demand finding the relationship between the text part and the graphic part to extract the data such as the real values for each bar in images. The new technique which introduced in this research can extract three values of each bar namely Start, End and Exact values depending on horizontal and vertical lines of the bar chart image. In addition, the Word 2-gram and Euclidean distance methods are used to detect and find the plagiarism. Experimental results show the ability of the system to detect plagiarism for ten possible patterns of bar chart plagiarisms. The performance of the system is evaluated depending on overlapping features and precision and recall. The experimental results show the ability of the system to detect not only copy and paste data of bars, but also restructuring and summarization of captions of image as well as modifications to data of bar chart images, such as swapping among bars, changing colors and changing scales of bar chart images.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2013; Supervisor : Prof. Dr. Naomie Salim
Uncontrolled Keywords:plagiaris, semantic computing
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Computer Science and Information System (Formerly known)
ID Code:39164
Deposited On:25 Jun 2014 12:03
Last Modified:13 Sep 2017 14:44

Repository Staff Only: item control page