Hair Zaki, Ummu Hani’ and Ibrahim, Roliana and Abd. Halim, Shahliza and Kamsani, Izyan Izzati (2022) Text detergent: The systematic combination of text pre-processing techniques for social media sentiment analysis. Lecture Notes on Data Engineering and Communications Technologies, 127 (NA). pp. 50-61. ISSN 2367-4512
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1007/978-3-030-98741-1_5
Abstract
During catastrophes such as natural or man-made disasters, social media services have evolved into a crucial tool utilised by communities to disseminate information. Because a vast number of social media data is being used for many applications, including sentiment analysis, sentiment analysis has become a very useful and demanding problem. Social media data cannot be applied directly because it is raw and unstructured or semi-structured data. Consequently, text pre-processing becomes one of the most important tasks because the process is strongly constrained by its dependable workflow. This reason creates a complex pattern in pre-processing workflows. For this purpose, different text pre-processing techniques have been used on Twitter, Facebook, and YouTube datasets to study the impact of different pre-processing techniques on the accuracy of machine learning algorithms. This paper applied different text pre-processing techniques in a specific sequence based on significance testing. This study examines their influence on sentiment classification accuracy using a machine learning classifier, Support Vector Machines (SVM). Results proved that applying all 14 techniques systematically can achieve up to 82.57% of the accuracy of the SVM classifier with unigram representations. By using Text Detergent, the YouTube dataset achieve the highest accuracy compared to Facebook and Twitter datasets. This will potentially improve the quality of the text and leads to better feature extraction, which in turn helps the sentiment analyst produce a better classifier.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Accuracy, Classification, Social media, Support vector machine, Text pre-processing |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Computing |
ID Code: | 99744 |
Deposited By: | Widya Wahid |
Deposited On: | 19 Mar 2023 10:27 |
Last Modified: | 19 Mar 2023 10:27 |
Repository Staff Only: item control page