Abstract | We are often faced with the challenge of mining data represented inrelational form. Unfortunately, most statistical learning methods work only with
“flat” data representations. Thus, to apply these methods, we are forced to con-
vert the data into a flat form, thereby not only losing its compact representa-
tion and structure but also potentially introducing statistical skew. These draw-
backs severely limit the ability of current statistical methods to mine relational
databases. Probabilistic models, in particular probabilistic relational models, al-
low us to represent a statistical model over a relational domain. These models
can represent correlations between attributes within a single table, and between
attributes in multiple tables, when these tables are related via foreign key joins.
In previous work [4, 6, 8], we have developed algorithms for automatically con-
structing a probabilistic relational model directly from a relational database. We
survey the results here and describe how the methods can be used to discover
interesting dependencies the data. We show how this class of models and our
construction algorithm are ideally suited to mining multi-relational data.
|