Ahmadi-Abkenari, Fatemeh and Selamat, Ali (2012) An architecture for a focused trend parallel web crawler with the application of clickstream analysis. Information Sciences, 184 (1). pp. 266-281. ISSN 0020-0255
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1016/j.ins.2011.08.022
Abstract
The tremendous growth of the Web poses many challenges for all-purpose single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues regarding the enormous dimension of the World Wide Web. Hence, more enhanced and convincing algorithms are on demand to yield more precise and relevant search results in an appropriate amount of time. Since employing link based Web page importance metrics within a multi-processes crawler bears a considerable communication overhead on the overall system and cannot produce the precise answer set, employing these metrics in search engines is not an absolute solution to identify the best search answer set by the overall search system. Thus considering the employment of a link independent Web page importance metric is required to govern the priority rule within the queue of fetched URLs. The aim of this paper is to propose a modest weighted architecture for a focused structured parallel Web crawler which employs a link independent clickstream based Web page importance metric. The experiments of this metric over the restricted boundary Web zone of our crowded UTM University Web site shows the efficiency of the proposed metric.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | clickstream analysis, focused crawlers, parallel crawlers, web data management, web page importance metrics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Computer Science and Information System |
ID Code: | 28674 |
Deposited By: | Yanti Mohd Shah |
Deposited On: | 12 Nov 2012 08:32 |
Last Modified: | 28 Jan 2019 03:38 |
Repository Staff Only: item control page