Universiti Teknologi Malaysia Institutional Repository

Spam filtering using bayesian technique based on independent feature selection

Mohamad, Masurah (2006) Spam filtering using bayesian technique based on independent feature selection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.

[img] PDF (Full text)
Restricted to Repository staff only

638Kb
[img] PDF
64Kb
[img] PDF
85Kb
[img] PDF
76Kb

Abstract

Bayesian technique is one of the classification techniques which can be applied to a certain problem domain such as classification task. Therefore, this technique had been chosen to conduct a classification task with emails dataset where the emails are comprised of spam and non spam emails. Bayesian technique has been applied to observe whether it can produce a good result in spam emails classification or not. Beside, this project also applied Rough set as a comparison technique to classify the spam emails. The classification task is done based on the independent feature selection where only one most occurrence term for each email is chosen as an input to the Bayesian probability. Some of the measurement evaluation had been used to evaluate the classification performance. The measurements are precision, recall, sensitivity, specificity, accuracy and error rate. After the measurements process, these two technique were compared to identify which one of these two techniques is best in classifies spam emails based on the experimental results. The results show that Bayesian technique is good than Rough set technique in classifies spam emails. However the results also indicate that Rough set also suitable for spam filtering problem. Finally, some suggestions were being discussed so that this project can be improved in future work to get a better result compared to the current result which had been retrieved in this project.

Item Type:Thesis (Masters)
Additional Information:Thesis (Master of Science (Computer Science)) - Universiti Teknologi Malaysia, 2006; Supervisor : Dr. Ali Bin Selamat
Uncontrolled Keywords:Spam filtering system; Email filtering; classification techniques; Bayesian technique; rough set technique
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:4066
Deposited By: Ms Zalinda Shuratman
Deposited On:24 Jul 2007 09:18
Last Modified:09 Jul 2012 01:13

Repository Staff Only: item control page