Selamat, Ali and Ng, Choon Ching and Mikami, Yoshiki (2008) Arabic script web documents language identification using decision Tree-ARTMAP model. In: International Conference on Convergence Information Technology, 21 - 23 November 2007.
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1109/ICCIT.2007.402
Automatic language identification (LID) is a topic of great significance in areas of intelligent and security, where the language identities of any related materials need to be identified before any information can be processed. When the recognition elements of any content is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods have been proposed in the literature are focusing on Roman and Asian languages. This paper describes text-based language identification approaches on Arabic script. Two different approaches have been compared. The decision trees method commonly used in many application domain is firstly reviewed. We also applied a simple method for language identification that is based on Adaptive Resonance Learning (ART) neural network. The experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model. However, decision tree model may not reliable if the language used extends to others Arabic script compared to ARTMAP model. It is assumed that hybrid of both models will perform better and merit for further development.
|Item Type:||Conference or Workshop Item (Paper)|
|Uncontrolled Keywords:||Adaptive Resonance Learning (ART) neural networks, ARTMAP, decision tree, text-based language identification, Arabic script|
|Subjects:||Q Science > QA Mathematics > QA75 Electronic computers. Computer science|
|Divisions:||Computer Science and Information System|
|Deposited By:||Ms Zalinda Shuratman|
|Deposited On:||28 Nov 2008 01:52|
|Last Modified:||28 Nov 2008 03:21|
Repository Staff Only: item control page