Universiti Teknologi Malaysia Institutional Repository

Web classification using extraction and machine learning techniques

Salim, Naomie and Yusuf, L. M. and Othman, M. S. (2010) Web classification using extraction and machine learning techniques. In: International Symposium On Information Technology 2010 (ITSim'10), 15-17 Jun 2010, Kuala Lumpur.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/ITSIM.2010.5561603


Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:extraction, machine learning, web classification, web document
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:24405
Deposited By: Mrs Liza Porijo
Deposited On:20 Sep 2012 03:13
Last Modified:20 Sep 2012 03:13

Repository Staff Only: item control page