Universiti Teknologi Malaysia Institutional Repository

Comparative study of machine learning algorithms in website phishing detection

Kalybayev, Almukhammed (2013) Comparative study of machine learning algorithms in website phishing detection. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
370kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Harmful programs that are created to thieve user credentials have become a lot over the recent years, potentially leading to a loss of cash. The methods which are utilized by attackers to collect confidential information vary, when online banking systems continue to be the main goal of these attacks. Nowadays most widespread approach to protect against phishing attack is using blacklists in antiviruses and browser toolbars. Unfortunately, blacklist method fails in responding to newly emanating phishing attacks since registering new domain names has become easier, no comprehensive blacklist can ensure a perfect up-to-date database. Therefore it requires another approach to counter phishing attack which is more accurate and efficient than blacklist method. The purpose of this work is to evaluate and analyze the effectiveness of applying machine learning algorithms such as an Artificial Neural Network, Support Vector Machines and K-nearest Neighbor to website phishing detection. The datasets of phishing and non-phishing websites were gathered in order to train, test machine learning algorithm models, compare evaluative metrics of algorithms between each other. In addition, the final dataset was divided into three datasets with different ratios to see whether or not the trained models will show constant performance in testing results and whether these proportions have a good or bad influence on the ability of trained models to classify website. After all the analysis of the performance of each machine learning algorithm was made. This project suggests the Support Vector Machines algorithm as the best one to be used in phishing detection regardless of dataset proportion, because it showed almost the same performance throughout all test phases which is 98.5% on average.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains Komputer (Keselamatan Maklumat)) - Universiti Teknologi Malaysia, 2013; Supervisor : Dr. Anazida Zainal
Uncontrolled Keywords:internet security measures, internet fraud, phishing
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Computing
ID Code:35828
Deposited By: Kamariah Mohamed Jong
Deposited On:23 Feb 2014 07:35
Last Modified:29 Jun 2017 06:40

Repository Staff Only: item control page