Abstract | P2P deployments are a natural infrastructure for buildingdistributed search networks. Proposed systems support
locating and retrieving all results, but lack the information
necessary to rank them. Users, however, are primarily
interested in the most relevant, and not all possible results.
Using random sampling, we extend a class of well-
known information retrieval ranking algorithms such that
they can be applied in this distributed setting. We analyze
the overhead of our approach, and quantify exactly how
our system scales with increasing number of documents,
system size, document to node mapping (uniform versus
non-uniform), and types of queries (rare versus popular
terms). Our analysis and simulations show that a) these
extensions are efficient, and can scale with little overhead
to large systems, and b) the accuracy of the results ob-
tained using distributed ranking is comparable to a cen-
tralized implementation.
|