Universiti Teknologi Malaysia Institutional Repository

Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis

Hair Zaki, Ummu Hani’ and Ibrahim, Roliana and Abd. Halim, Shahliza and Kamsani, Izyan Izzati (2022) Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis. In: Advances on Intelligent Informatics and Computing Health Informatics, Intelligent Systems, Data Science and Smart Computing. Lecture Notes on Data Engineering and Communications Technologies, 127 (NA). Springer Science and Business Media Deutschland GmbH, Cham, Switzerland, pp. 50-61. ISBN 978-3-030-98740-4

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/978-3-030-98741-1_59

Abstract

During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.

Item Type:Book Section
Uncontrolled Keywords:Accuracy, Classification, Social media, Support vector machine, Text pre-processing
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:99745
Deposited By: Widya Wahid
Deposited On:19 Mar 2023 10:27
Last Modified:04 Apr 2023 06:47

Repository Staff Only: item control page