Universiti Teknologi Malaysia Institutional Repository

Improved letter weighting feature selection on arabic script language identification

Ng, Choon-Ching and Selamat, Ali (2009) Improved letter weighting feature selection on arabic script language identification. In: 2009 First Asian Conference on Intelligent Information and Database Systems. Article number 5175984 . IEEE, pp. 150-154. ISBN 978-076953580-7

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/ACIIDS.2009.33

Abstract

Language identification is the process identifying predefined language in a document automatically; we focused on the web documents in this paper. Initially, we have applied the letter frequency as features combine with neural networks in Arabic script language identification. However, reliability of selected letters of the features is a major issue to be overcome. Therefore, we propose an improved letter weighting feature selection in order to enhance the effectiveness of language identification. It is based on the concept letter frequency document frequency. From the experiments, we have found that the improved letter weighting feature selection achieve the highest accuracy 99.75% on Arabic script language identification.

Item Type:Book Section
Additional Information:2009 1st Asian Conference on Intelligent Information and Database Systems, ACIIDS 2009; Dong Hoi; 1 April 2009 through 3 April 2009
Uncontrolled Keywords:document frequency, feature selection, language identification, web document
Subjects:P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:15106
Deposited By: Ms Zalinda Shuratman
Deposited On:30 Sep 2011 15:08
Last Modified:30 Sep 2011 15:08

Repository Staff Only: item control page