Universiti Teknologi Malaysia Institutional Repository

Using linguistic patterns in FCA-based approach for automatic acquisition of taxonomies from Malay text

Ahmad Nazri, Mohd Zakree and Abu Bakar, Azuraliza and Shamsudin, Siti Mariyam and Abd Ghani, Tarmizi (2008) Using linguistic patterns in FCA-based approach for automatic acquisition of taxonomies from Malay text. In: Proceedings - International Symposium on Information Technology 2008, ITSim. Institute of Electrical and Electronics Engineers, New York, 1173 -1179. ISBN 978-142442328-6

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/ITSIM.2008.4631709

Abstract

Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire taxonomies from Indo-European text. The taxonomies are built via FCA using syntactic dependencies as attributes such as verb/head-object, verb/head-subject and verb/prepositional phrase-complement. This paper discusses the overall process of learning taxonomy using FCA with the same syntactic dependencies as the English language which is then applied on Malay texts. Malay, an Austronesian language follows the same Subject-Verb-Object sentence structure like English but syntactically different. The result shows a lower recall and precision compared to related work in other languages. The poor result is caused by several factors such as the selection of smoothing technique. The experimental result indicates that the current smoothing technique with FCA does not produce good results. Therefore, as an addition to the syntactic dependencies, we used linguistic pattern such as Hearst's pattern in finding similarities between terms. We compare the results of our technique against the cosine used in the FCA-based taxonomy learning approach. The proposed technique attains both higher precision and recall than the previous technique.

Item Type:Book Section
Additional Information:ISBN: 978-142442328-6; International Symposium on Information Technology 2008, ITSim; Kuala Lumpur; 26 August 2008 through 29 August 2008
Uncontrolled Keywords:information technology, linguistics, query languages, steel beams and girders, syntactics, technology transfer
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:12794
Deposited By: Liza Porijo
Deposited On:29 Jun 2011 10:07
Last Modified:29 Jun 2011 10:07

Repository Staff Only: item control page