Universiti Teknologi Malaysia Institutional Repository

Architecture for a parallel focused crawler for clickstream analysis

Selamat, Ali and Ahmadi-Abkenari, Fatemeh (2011) Architecture for a parallel focused crawler for clickstream analysis. In: The 3rd Asian Conference On Intelligent And Database System.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/978-3-642-20039-7-3

Abstract

The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Meanwhile, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Due to the fact that employing the link based Web page importance metrics in search engines is not an absolute solution to identify the best answer set by the overall search system and because employing such metrics within a multi-processes crawler bears a considerable communication overhead on the overall system, employing a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel crawler in which the credit assignment to the discovered URLs is performed upon a combined metric based on clickstream analysis and Web page text similarity analysis to the specified mapped topic(s).

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:clickstream analysis
Divisions:Computing
ID Code:45605
Deposited By: Haliza Zainal
Deposited On:10 Jun 2015 03:00
Last Modified:29 Aug 2017 00:57

Repository Staff Only: item control page