Universiti Teknologi Malaysia Institutional Repository

Enhanced rules application order to stem affixation, reduplication and compounding words in malay texts

Kassim, M. N. and Maarof, M. A. and Zainal, A. and Wahab, A. A. (2016) Enhanced rules application order to stem affixation, reduplication and compounding words in malay texts. In: 14th International Workshop on Knowledge Management and Acquisition for Intelligent Systems, PKAW2016, 22-23 Aug 2016, Phuket, Thailand.

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Word stemmer is an automated program to remove affixes, clitics and particles from derived words based on morphological structures of specific natural languages. It has been widely used for text preprocessing in many artificial intelligence applications. Furthermore, the performance of word stemmer to correctly stem derived words has an influence to the performance of information retrieval, text mining and text categorization applications. Despite of various stemming approaches were proposed in the past research, the existing word stemmers for Malay language still suffer from stemming errors. Moreover, the existing word stemmers partially consider morphological structures of Malay language in which only focused on affixation words instead of affixation, reduplication and compounding words, simultaneously. Therefore, this paper proposes an enhanced word stemmer using rule-based affixes removal and dictionary lookup methods called enhanced rule application order that is able to stem affixation, reduplication and compounding words and at the same time, is able to address possible stemming errors. This paper also examines possible root causes of affixation, reduplication and compounding stemming errors that could happen during word stemming process. The experimental results indicate that the proposed word stemmer is able to stem affixation, reduplication and compounding words with better stemming accuracy by using enhanced rule application order.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Malay word stemmer, Rules application order, Stemming error, Word stemmer, Word stemming algorithm
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:73493
Deposited By: Mohd Zulaihi Zainudin
Deposited On:26 Nov 2017 03:37
Last Modified:26 Nov 2017 03:37

Repository Staff Only: item control page