Sam, Lee Zhi and Maarof, Mohd. Aizaini and Selamat, Ali (2006) Automated web pages classification with integration of principal component analysis (PCA) and independent component analysis (ICA) as feature reduction. In: Proceedings of International Conference on Man-Machine Systems 2006, September 15-16 2006, Langkawi, Malaysia.
PDF
194kB |
Official URL: http://icomms.unimap.edu.my/index.htm
Abstract
With the explosive growth of internet, web pages classification has become an essential issue. This is because web pages classification will provide an efficient information search to internet users. Without professional classification, a website would become a jumble yard of content which is confusing and time wasting. By using web pages classification, it allows web visitors to navigate a web site quickly and efficiently. However, presently most of the web directories are still being classified manually or using semi-automated (huge teams of human editors)[1]. Automated web pages classification is highly in demand in order to replace expensive manpower and reduce the time consumed. In this paper we analyze the concept of a new model, which uses an integration of Principal Component Analysis (PCA) and Independent Component Analysis (ICA) as feature reduction for web pages classification. This model consists of several modules, which are web page retrieval process, stemming, stop-word filtering, feature reduction, feature selection, classification and evaluation.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | web page classification, web page retrieval, principal component analysis, independent component analysis, class profile based feature, neural networks |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Computer Science and Information System |
ID Code: | 3129 |
Deposited By: | Dr Ali Selamat |
Deposited On: | 24 Oct 2007 03:45 |
Last Modified: | 30 Sep 2017 04:28 |
Repository Staff Only: item control page