Universiti Teknologi Malaysia Institutional Repository

Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine

Elssied Hamed, Nadir Omer Fadl (2015) Improved scheme of e-mail spam classification using meta-heuristics feature selection and support vector machine. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
963kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

With the technological revolution in the 21st century, time and distance of communication are decreased by using electronic mail (e-mail). Furthermore, the growing use of e-mail has led to the emergence and further growth problems caused by unsolicited bulk e-mails, commonly referred to as spam e-mail. Many of the existing supervised algorithms like the Support Vector Machine (SVM) were developed to stop the spam e-mail. However, the problem of dealing with large data and high dimensionality of feature space can lead to high execution-time and low accuracy of spam e-mail classification. Nowadays, removing the irrelevant and redundant features beside finding the optimal (or near-optimal) subset of features significantly influences the performance of spam e-mail classification; this has become one of the important challenges. Therefore, in order to optimize spam e-mail classification accuracy, dimensional reduction issues need to be solved. Feature selection schemes become very important in order to reduce the dimensionality through selecting a proper subset feature to facilitate the classification process. The objective of this study is to investigate and improve schemes to reduce the execution time and increase the accuracy of spam e-mail classification. The methodology of this study comprises of four schemes: one-way ANOVA f-test, Binary Differential Evolution (BDE), Opposition Differential Evolution (ODE) and Opposition Particle Swarm Optimization (OPSO), and combination of Differential Evolution (DE) and Particle Swarm Optimization (PSO). The four schemes were used to improve the spam e-mail classification accuracy. The classification accuracy of the proposed schemes were 95.05% with population size of 50 and 1000 number of iterations in 20 runs and 41 features. The experiment of the proposed schemes were carried out using spambase and spamassassin benchmark dataset to evaluate the feasibility of proposed schemes. The experimental findings demonstrate that the improved schemes were able to efficiently reduce the number of features as well as improving the e-mail classification accuracy.

Item Type:Thesis (PhD)
Additional Information:Thesis (Ph.D (Sains Komputer)) - Universiti Teknologi Malaysia, 2015; Supervisor : Assoc. Prof. Dr. Othman Ibrahim
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:77765
Deposited By: Fazli Masari
Deposited On:04 Jul 2018 11:44
Last Modified:04 Jul 2018 11:44

Repository Staff Only: item control page