Universiti Teknologi Malaysia Institutional Repository

Hybridized term-weighting method for web contents classification using SVM

Odeh Sabbah, Thabit Sulaiman and Selamat, Ali and Selamat, Md. Hafiz and Ibrahim, Roliana and Fujita, Hamido (2015) Hybridized term-weighting method for web contents classification using SVM. In: Software and Information Science, Iwate Prefectural University, 2015, Japan.

Full text not available from this repository.

Official URL: https://www.researchgate.net/publication/282254779...

Abstract

The role of intelligence and security informatics based on statistical computations is becoming more significant in detecting terrorism activities proactively as the extremist groups are misusing many of the obtainable facilities on the Internet to incite violence and hatred. However, the performance of statistical methods is reported to be limited due to the inadequate accuracy produced by the inability of these methods to comprehend the meaning of texts created by humans. Miss classification of the actual terrorism web content as non-terrorism or vice versa reduces the usefulness of intelligent techniques to support the efforts against potential threats, and limits the opportunities for the effective use of intelligence and security informatics in the early detection of terrorist activities. In this paper, we propose a hybridized method based on the basic term-weighting techniques for accurate terrorism activities detection in textual contexts. The proposed method combines the feature sets generated by different individual term-weighting techniques such as Term Frequency (TF), Document Frequency (DF), Term Frequency-Inverse Document Frequency (TF-IDF), Glasgow, and Entropy into one feature set for effective classification. Moreover, two combination functions are proposed to reduce the dimensionality of combined feature set. The method is tested on a selected dataset from the Dark Web Portal Forum (DWPF) and benchmarked using Support Vector Machine (SVM), and other famous text classifiers such as K-Nearest Neighbor (KNN), Decision Trees (DT), Naïve Bayes (NB), and Extreme Learning Machine (ELM) classifiers. Experimental results show that the hybridized method efficiently identifies the terrorist activities content and outperforms the individual methods. Moreover, the results further revealed that the classification performance achieved by hybridizing few feature sets is relatively competitive in the number of features used for classification with higher hybridization levels. Moreover, the experiments of hybridizing functions show that the dimensionality of the feature sets is significantly reduced by applying the symmetric difference function for feature sets combination.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:feature sets combination, text classification
Subjects:Q Science > QA Mathematics > QA76 Computer software
Divisions:Computing
ID Code:63288
Deposited By: Widya Wahid
Deposited On:17 May 2017 04:35
Last Modified:21 Aug 2017 00:35

Repository Staff Only: item control page