Universiti Teknologi Malaysia Institutional Repository

Support vector machine algorithm for sms spam classification in the telecommunication industry

Sjarif, N. N. A. and Yahya, Y. and Chuprat, S. and Azmi, N. H. F. M. (2020) Support vector machine algorithm for sms spam classification in the telecommunication industry. International Journal on Advanced Science, Engineering and Information Technology, 10 (2). pp. 635-639. ISSN 2088-5334


Official URL: http://www.dx.doi.org/10.18517/ijaseit.10.2.10175


In recent years, we have withnessed a dramatic increment volume in the number of mobile users grows in telecommunication industry. However, this leads to drastic increase to the number of spam SMS messages. Short Message Service (SMS) is considered one of the widely used communication in telecommunication service. In reality, most of the users ignore the spam because of the lower rate of SMS and limited amount of spam classification tools. In this paper, we propose a Support Vector Machine (SVM) algorithm for SMS Spam Classification. Support Vector Machine is considered as the one of the most effective for data mining techniques. The propose algorithm have been evaluated using public dataset from UCI machine learning repository. The performance achieved is compared with other three data mining techniques such as Naive Bayes, Multinominal Naive Bayes and KNearest Neighbor with the different number of K= 1,3 and 5. Based on the measuring factors like higher accuracy, less processing time, highest kappa statistics, low error and the lowest false positive instance, it's been identified that Support Vector Machines (SVM) outperforms better than other classifiers and it is the most accurate classifier to detect and label the spam messages with an average an accuracy is 98.9%. Comparing both the error parameter overall, the highest error has been found on the algorithm KNN with K=3 and K=5. Whereas the model with less error is SVM followed by Multinominal Naive Bayes. Therefore, this propose method can be used as a best baseline for further comparison based on SMS spam classification.

Item Type:Article
Uncontrolled Keywords:classification, data mining, short message service
Subjects:T Technology > T Technology (General)
Divisions:Razak School of Engineering and Advanced Technology
ID Code:87522
Deposited By: Narimah Nawil
Deposited On:08 Nov 2020 12:05
Last Modified:08 Nov 2020 12:05

Repository Staff Only: item control page