Universiti Teknologi Malaysia Institutional Repository

Neural networks for web news classification based on augmented PCA

Selamat, Ali and Omatu, Sigeru (2003) Neural networks for web news classification based on augmented PCA. In: International Joint Conference on Neural Networks (IJCNN 2003), July 20-24, 2003, Portland, Oregon, USA.

[img] PDF
182Kb

Official URL: http://ieeexplore.ieee.org/iel5/8672/27485/0122367...

Abstract

In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profilebased features (CPBF). Each news web page is represented by the term-weighting scheme. As the number of unique words in the collection set is big, the principal component analysis (PCA) has been used to select the most relevant features for the classification. Then the final output of the PCA is augmented with the feature vectors from the class-profile which contains the most regular words in each class before feeding them to the neural networks. We have manually selected the most regular words that exist in each class and weighted them using an entropy weighting scheme. The fixed number of regular words from each class will be used as a feature vectors together with the reduced principal components from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM method provides acceptable classification accuracy with the sports news datasets.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:neural network, web page classification, principal component analysis
Subjects:Q Science > QA Mathematics > QA76 Computer software
Divisions:Computer Science and Information System (Formerly known)
ID Code:3123
Deposited By: Dr Ali Selamat
Deposited On:13 Feb 2008 05:13
Last Modified:10 May 2011 05:28

Repository Staff Only: item control page