Universiti Teknologi Malaysia Institutional Repository

Content based fraudulent website detection using supervised machine learning techniques

Maktabar, Mahdi and Zainal, Anazida and Maarof, Mohd. Aizaini and Kassim, Mohamad Nizam (2018) Content based fraudulent website detection using supervised machine learning techniques. In: 17th International Conference on Hybrid Intelligent Systems, HIS 2017, 14 December 2017 through 16 December 2017, Delhi, India.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/978-3-319-76351-4_30

Abstract

Fraudulent websites pose as legitimate sources of information, goods, product and services are propagating and resulted in loss of billions of dollars. Due to several undesirable impacts of Internet fraud and scam, several studies and approaches are focused to identify fraudulent Internet websites, yet none of them managed to offer an efficient solution to suppress these fraudulent activities. With this regard, this research proposes a fraudulent website detection model based on sentiment analysis of the textual contents of a given website, natural language processing and supervised machine learning techniques. The proposed model consists of four primary phases which are data acquisition phase, preprocessing phase, feature extraction phase and classification phase. Crawler is used to obtained data from Internet and data was cleaned to remove non-discriminative noises and reshape into desired format. Later, meaningful and discriminative patterns are extracted. Finally classification phase consists of supervised machine learning techniques to construct the fraudulent website detection model. This research employs 10-fold stratified cross validation technique in order to validate the performance of the proposed model. Experimental results show that the proposed fraudulent website detection model with cross validated accuracy of 97.67% and FPR of 3.49% achieved satisfactory results and served the aim of this research.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:sentiment analysis, text mining, classification, bag of words, fraud detection
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:81884
Deposited By: Yanti Mohd Shah
Deposited On:30 Sep 2019 12:59
Last Modified:30 Sep 2019 12:59

Repository Staff Only: item control page