Universiti Teknologi Malaysia Institutional Repository

Evaluation of machine learning techniques for imbalanced data in IDS

Mokaramian, Shahram (2013) Evaluation of machine learning techniques for imbalanced data in IDS. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing.


Official URL: http://dms.library.utm.my:8080/vital/access/manage...


Network Intrusion Detection System (IDS) is an automated system that can detect a malicious traffic and it plays a critical role in a network. In recent years, machine learning algorithms have been developed and used to detect network intrusion. Most standard machine learning algorithms often give high overall accuracy. However, they favor on majority class when dealing with imbalanced data. Unfortunately, IDS deals with highly imbalanced data distribution and most machine learning algorithms have poor detection on R2L and U2R classes, which include malicious attacks. Therefore, it requires a resampling technique to balance the data. The purpose of this study is to investigate performance of three machine learning algorithms which are Support Vector Machine (SVM), Decision Tree (DT) and Fuzzy Classifier (FC) for imbalanced data in IDS and after the rebalanced the data which was achieved using Synthetic Minority Over-sampling TEchnique (SOMTE). The performance of the three machine learning algorithms was evaluated with the new rebalanced data. The benchmark DARPA KDDCup 1999 IDS dataset was used. SMOTE was implemented with two imbalance ratio, one is 1:4 another one is 1:1. After analysis the results of before and after resampling showed that FC performs better with imbalance ratio of 1:1. The accuracy of FC with balanced data was Normal traffic (99.19%), Denial of Service attacks (99.35%), Probe attacks (99.51%), Remote to Local attacks (99.67%) and User to Root attacks (99.41%). In addition, the data with imbalance ratio of 1:1 get the better results on all classes with these three machine learning algorithms.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains Komputer (Keselamatan Maklumat)) - Universiti Teknologi Malaysia, 2013; Supersivor : Dr. Anazida Zainal
Uncontrolled Keywords:machine learning, computational learning theory
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering > TK7885-7895 Computer engineer. Computer hardware
ID Code:37080
Deposited By: Fazli Masari
Deposited On:31 Mar 2014 01:45
Last Modified:29 Jun 2017 07:03

Repository Staff Only: item control page