Universiti Teknologi Malaysia Institutional Repository

Enhanced text stemmer with noisy text normalization for Malay texts

Kassim, Mohamad Nizam and Mat Jali, Shaiful Hisham and Maarof, Mohd. Aizaini and Zainal, Anazida and Abdul Wahab, Amirudin (2020) Enhanced text stemmer with noisy text normalization for Malay texts. In: Smart Trends in Computing and Communications Proceedings of SmartCom 2019. Smart Innovation, Systems and Technologies, 165 (NA). Springer, Singapore, Gateway East, Singapore, pp. 433-444. ISBN 978-981-15-0076-3

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007%2F978-981-15-0077-0_44

Abstract

In general, the current text stemmers for Malay texts were not developed for text stemming against social media texts. Therefore, there is a need to develop an enhanced text stemmer that is able to map morphological variants based on the characteristics of non-standard derived word patterns on social media platforms. It deals with noncompliance word patterns (also called noisy texts or micro text) such as misspelled word and texting language which are often being used as informal conversation. This paper proposes an enhanced text stemmer to perform text stemming against social media texts. The investigation focuses on different patterns of non-standard, non-derived words (mechanics, non-standard word formation, code-switching, and slang words) and also non-standard derived words. The experimental results show that the performance of the proposed text stemmer depends on how much “noise” is in social media texts.

Item Type:Book Section
Uncontrolled Keywords:Stemming algorithm, Noisy texts
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:93031
Deposited By: Widya Wahid
Deposited On:07 Nov 2021 05:59
Last Modified:04 Apr 2023 07:43

Repository Staff Only: item control page