Ahmadi-Abkenari, Fatemeh and Selamat, Ali (2012) Parallel web crawler architecture for clickstream analysis. In: Communications In Computer And Information Science. Springer, Berlin, pp. 123-132. ISBN 978-3-642-32825-1 (Print); 978-3-642-32826-8 (Electronic)
Full text not available from this repository.
Official URL: http://dx.doi.org/10.1007/978-3-642-32826-8_13
Abstract
The tremendous growth of the Web causes many challenges for single-process crawlers including the presence of some irrelevant answers among search results and the coverage and scaling issues. As a result, more robust algorithms needed to produce more precise and relevant search results in an appropriate timely manner. The existed Web crawlers mostly implement link dependent Web page importance metrics. One of the barriers of applying this metrics is that these metrics produce considerable communication overhead on the multi agent crawlers. Moreover, they suffer from the shortcoming of high dependency to their own index size that ends in their failure to rank Web pages with complete accuracy. Hence more enhanced metrics need to be addressed in this area. Proposing new Web page importance metric needs define a new architecture as a framework to implement the metric. The aim of this paper is to propose architecture for a focused parallel crawler. In this framework, the decision-making on Web page importance is based on a combined metric of clickstream analysis and context similarity analysis to the issued queries.
Item Type: | Book Section |
---|---|
Additional Information: | Indexed by Scopus |
Uncontrolled Keywords: | clickstream analysis, parallel crawlers, web data management, web page importance metrics |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Computer Science and Information System |
ID Code: | 35741 |
Deposited By: | Fazli Masari |
Deposited On: | 29 Oct 2013 01:05 |
Last Modified: | 02 Feb 2017 04:57 |
Repository Staff Only: item control page