Abstract | The web is becoming the preferred medium for communicatingand storing information pertaining to almost any human activity.
However it is an ephemeral medium whose contents are
constantly changing, resulting in a permanent loss of part of our
cultural and scientific heritage on a regular basis. Archiving
important web contents is a very challenging technical problem
due to its tremendous scale and complex structure, extremely
dynamic nature, and its rich heterogeneous and deep contents. In
this paper, we consider the problem of archiving a linked set of
web objects into web containers in such a way as to minimize the
number of containers accessed during a typical browsing session.
We develop a method that makes use of the notion of PageRank
and optimized graph partitioning to enable faster browsing of
archived web contents. We include simulation results that
illustrate the performance of our scheme and compare it to the
common scheme currently used to organize web objects into web
containers.
|