Universiti Teknologi Malaysia Institutional Repository

Text content analysis for illicit web pages by using neural networks

Maarof, Mohd. Aizaini and Selamat, Ali and Shamsuddin, Siti Mariyam and Lee, Zhi Sam (2009) Text content analysis for illicit web pages by using neural networks. Jurnal Teknologi, 50 (D). pp. 73-91. ISSN 2180-3722

[img] PDF (Full Text) - Published Version
Restricted to Repository staff only

315Kb
[img] HTML - Published Version
17Kb

Abstract

Illicit web contents such as pornography, violence, and gambling have greatly polluted the mind of web users especially children and teenagers. Due to the ineffectiveness of some popular web filtering techniques like Uniform Resource Locator (URL) blocking and Platform for Internet Content Selection (PICS) checking against today's dynamic web contents, content based analysis techniques with effective model are highly desired. In this paper, we have proposed a textual content analysis model using entropy term weighting scheme to classify pornography and sex education web pages. We have examined the entropy scheme with two other common term weighting schemes that are TFIDF and Glasgow. Those techniques have been tested with artificial neural network using small class dataset. In this study, we found that our proposed model has achieved better performance in terms accuracy, convergence speed, and stability compared to the other techniques.

Item Type:Article
Uncontrolled Keywords:artificial neural network, term weighting scheme, textual content analysis, web pages classification
Subjects:Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:21030
Deposited By: Ramli Haron
Deposited On:14 Jan 2012 05:09
Last Modified:21 Nov 2013 01:23

Repository Staff Only: item control page