Abstract | We introduce a novel active learning algo-rithm for classification of network data. In
this setting, training instances are connected
by a set of links to form a network, the labels
of linked nodes are correlated, and the goal
is to exploit these dependencies and accu-
rately label the nodes. This problem arises in
many domains, including social and biologi-
cal network analysis and document classifica-
tion, and there has been much recent interest
in methods that collectively classify the nodes
in the network. While in many cases labeled
examples are expensive, often network infor-
mation is available. We show how an active
learning algorithm can take advantage of net-
work structure. Our algorithm effectively ex-
ploits the links between instances and the in-
teraction between the local and collective as-
pects of a classifier to improve the accuracy of
learning from fewer labeled examples. We ex-
periment with two real-world benchmark col-
lective classification domains, and show that
we are able to achieve extremely accurate re-
sults even when only a small fraction of the
data is labeled.
|