Universiti Teknologi Malaysia Institutional Repository

Arabic language script and encoding identification with support vector machines and rough set theory

Mohamed Sidya, Mohamed Ould (2007) Arabic language script and encoding identification with support vector machines and rough set theory. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.

[img] PDF
Restricted to Repository staff only

1007Kb
[img] PDF (Abstract)
163Kb
[img] PDF (Table Of Content)
183Kb
[img] PDF (1st Chapter)
278Kb

Abstract

Arabic is ranking sixth among the world’s spoken languages with more than 230 million speakers around the Arabic world. There are different flavors and dialects of Arabic; the most common one is the Egyptian Arabic which has the largest number of users (more than 50 millions). Although, only a small number Arabic speakers use the internet, still it constitutes a considerable share to the internet community. Unfortunately, so far, there has been no research to automatically distinguish between the Arabic language and the other languages that use the same script. This project deals with identifying the Arabic language from the Persian language; both languages are written in the Arabic script. The data for this project has been collected from the internet, the BBC website in particular. Many operations have been applied to this data, including stop word removal and stemming. This project is established to compare the performance of Support Vector Machines with Rough Set Theory in Identifying the Arabic language. The results show that both methods perform well but the Support Vector Machines outperform the Rough Set Theory.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2007; Supervisor : Dr. Ali Bin Selamat
Uncontrolled Keywords:language identification, Arabic script, support vector machines, rough set theory
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:6795
Deposited By: Ms Zalinda Shuratman
Deposited On:25 Nov 2008 03:23
Last Modified:20 Sep 2012 05:47

Repository Staff Only: item control page