Universiti Teknologi Malaysia Institutional Repository

Offline arabic character recognition using genetic approach

Aljuaid, Hanan Abdulrahman (2010) Offline arabic character recognition using genetic approach. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.



Many optical character recognition (OCR) techniques and tools have been developed for plurality of languages. A successful OCR system improves interactivity between humans and computers in many applications such as digitising and recognising written content. With regard to Arabic OCR, the problem of handwriting recognition is challenging because Arabic letters are cursive and shapechangeable depending on their positions. OCR systems have reached nearly perfect acknowledgement of Arabic printed text, yet still in its inception and needs to be greatly improved with handwritten text. Therefore in this study, an approach to recognize Arabic characters based on genetic algorithms (GA) is proposed. The approach requires two separate stages; feature extraction and GA for character recognition development. In the feature extraction stage, six features are detected for each character and denoted as a feature vector of 6 integer numbers. The feature vectors are then utilised in the next stage. Three genetic operators namely selection, crossover and mutation are implemented to search for the similar vectors with the best fitness value to recognise the character. The data used in this study were collected from different resources and stored in a database. It consists of 12,500 printed text words in 50 paragraphs and 15,000 words written by 100 different writers, males and females aged 5 to 60 years. Pre-processing operations are conducted including segmenting paragraphs into lines, segmenting line into words, segmenting words into characters, detecting skeleton, and determining baseline and other horizontal zones. The experimental results have shown that the proposed method has achieved promising accuracy recognition rate with 90.46% for printed text and handwritten characters.

Item Type:Thesis (Masters)
Additional Information:Supervisor : Prof. Dr. Dzulkifli Mohamad; Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia 2010
Uncontrolled Keywords:optical character recognition
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:16567
Deposited By: Zalinda Shuratman
Deposited On:16 Jan 2012 09:56
Last Modified:18 Sep 2017 06:07

Repository Staff Only: item control page