Universiti Teknologi Malaysia Institutional Repository

A comparative evaluation of machine learning approaches in SMS spam detection

Salehi, Saber (2011) A comparative evaluation of machine learning approaches in SMS spam detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.

[img]
Preview
PDF
344kB

Abstract

Spam detection is a significant problem which is considered by many researchers by various developed strategies. In this study, the popular performance measure is a classification accuracy which deals with false positive, false negative and accuracy. These metrics were evaluated under applying three supervised learning algorithm (Hybrid of Simple Artificial Immune System (SAIS) and Particle Swarm Optimization (PSO), Naive Bayes Classifier (NBC), Enhanced Genetic Algorithm (EGA)) based on classification of SMS contents were evaluated and compared. In this research, SAIS was hybridized by particle swarm optimization (PSO) for optimizing the performance of SAIS for spam filtering. PSO was used with mutation to reinforce the immune system’s searches to find the best class in exemplar for classification. Results were improved using Hybrid SAIS and PSO. The proposed EGA was to achieve the best chromosomes which were grouped by the keywords. Then, the best chromosome with highest fitness value was selected as classifier. Simulated annealing (SA) was used with classical mutation and crossover to reinforce the efficiency of genetic searches. Achieved results represent the enhanced GA is markedly superior to that of a classical GA. These algorithms were trained and tested on a set of 4601 SMS messages in which 1813 were spams and 2788 were non-spams. Results showed that the proposed EGA technique gave better result compare to the hybrid SAIS and PSO and NBC techniques. Results also showed that the proposed EGA technique gave 99.87% accuracy, and the proposed NBC, hybrid of SAIS and PSO techniques gave 97.457% and 88.33% accuracy, respectively.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Teknologi Maklumat )) - Universiti Teknologi Malaysia, 2011; Supervisor : Dr. Ali Selamat
Uncontrolled Keywords:particle swarm optimization, spam detection, simulated annealing
Subjects:H Social Sciences > HD Industries. Land use. Labor
Divisions:Computer Science and Information System (Formerly known)
ID Code:32801
Deposited By: Narimah Nawil
Deposited On:29 Jul 2013 08:24
Last Modified:27 May 2018 15:54

Repository Staff Only: item control page