Home

Breadth-first search crawling yields high-quality pages


Author(s) : Marc Najork, 
Publisher : N/A
Publication Date : 2001
ISSN : N/A
Abstract : This paper examines the average page quality over time of pages downloaded during a web crawl of 328 million unique pages. We use the connectivity-based metric PageRank to measure the quality of a page. We show that traversing the web graph in breadth-first search order is a good crawling strategy, as it tends to discover high-quality pages early on in the crawl.,