Universiti Teknologi Malaysia Institutional Repository

A clickstream-based focused trend parallel web crawler

Ahmadi-Abkenari, F. and Selamat, Ali (2010) A clickstream-based focused trend parallel web crawler. International Journal of Computer Applications, 9 (5). pp. 1-8. ISSN 0975-8887

[img] HTML - Published Version
297kB

Official URL: http://www.ijcaonline.org/volume9/number5/pxc38718...

Abstract

The immense growing dimension of the World Wide Web induces many obstacles for all-purpose single-process crawlers including the presence of some incorrect answers among search results and the scaling drawbacks. As a result, more enhanced heuristics are needed to provide more accurate search outcomes in an appropriate timely manner. Regarding the fact that employing link dependent Web page importance metrics within a parallel crawler yields a considerable overhead on the overall searching system, and also because such a metric is not able to cover the authorized Web content in dark net and authorized fresh pages, therefore employing these metrics is not an absolute solution within search engines’ architecture. This paper proposes the application of a link independent Web page importance metric to govern the priority rule within the crawl frontier through proposing a modest weighted architecture for a focused structured parallel Web crawler (CFP crawler) in which the credit assignment to URLs in crawl frontier is done according to a clickstream-based prioritizing algorithm.

Item Type:Article
Uncontrolled Keywords:clickstream analysis, focused crawlers, parallel crawlers, web data management, web page importance metrics
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System
ID Code:37000
Deposited By:INVALID USER
Deposited On:09 Mar 2014 08:45
Last Modified:15 Feb 2017 00:33

Repository Staff Only: item control page