Unigram language identifications using adaptive neutral network

Selamat , Ali and Ng, Choon-Ching (2008) Unigram language identifications using adaptive neutral network. In: Proceedings - International Symposium on Information Technology 2008, ITSim. Institute of Electrical and Electronics Engineers, New York, 1078 -1082. ISBN 978-142442328-6

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/ITSIM.2008.4631694

Abstract

In general, a web document page may contain several script forms. Each script can be used for constructing different languages. Determining the languages of the document is the required to effectively be able to apply many search and information retrieval techniques. In this work, we propose hybrid-grams feature selection methods by integrating unigram and bigrams. The method makes use of local statistical information or data within a document to determine the language. From the experiments, we have noticed that hybrid-grams are outperformed than unigram and bigrams in Cyrillic and Indic script language identifications.

Item Type:	Book Section
Additional Information:	ISBN: 978-142442328-6; International Symposium on Information Technology 2008, ITSim; Kuala Lumpur; 26 August 2008 through 29 August 2008
Uncontrolled Keywords:	feature extraction, information services, information technology, linguistics
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Computer Science and Information System
ID Code:	12790
Deposited By:	Liza Porijo
Deposited On:	29 Jun 2011 07:52
Last Modified:	29 Jun 2011 07:52

Repository Staff Only: item control page