Universiti Teknologi Malaysia Institutional Repository

Performance improvement of poem genre classification using a combination of SMOTE and support vector machine

Quratu Aini, Quratu Aini and Muljono, Muljono and Yakub, Fitri (2023) Performance improvement of poem genre classification using a combination of SMOTE and support vector machine. In: 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), 16 September 2023-17 September 2023, Semarang, Indonesia.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/iSemantic59612.2023.1029...

Abstract

Text classification aims to be able to classify documents into the correct class. In this study, the use of SVM to perform text classification. The existence of this research is to find out how good the performance produced by SVM is in classifying text. A total of 841 poetry data were categorized into 4 genres namely affection, death, environment, and music. The data is cleaned at the text preprocessing stage with the stages of case folding (lowercase, remove punctuation and whitespace), tokenization, stopword removal, and lemmatizing. Feature extraction using Bag of Word (BoW) produces 6860 features. Features resulting from BoW will be weighted using TF - IDF. Data separation is carried out with a separation ratio of 80:20, 70:30, and 60:40. There is data imbalance, so it needs to be balanced. In this research, data balancing is done using SMOTE. Data separation is done for original data, balancing result data, and balancing result data with PCA. The highest accuracy result of train data is obtained by balancing data with a 60:40 separation of 97%. While the highest test data accuracy result is obtained by balancing data with an 80:20 separation of 87%. Thus the highest accuracy of each train data and test data is obtained by balancing data.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Imbalanced Data, PCA, SMOTE, SVM, text classification, TF-IDF
Subjects:T Technology > T Technology (General)
Divisions:Malaysia-Japan International Institute of Technology
ID Code:107699
Deposited By: Widya Wahid
Deposited On:02 Oct 2024 06:29
Last Modified:02 Oct 2024 06:29

Repository Staff Only: item control page