Indexing correlated probabilistic databases
Title | Indexing correlated probabilistic databases |
Publication Type | Conference Papers |
Year of Publication | 2009 |
Authors | Kanagal B, Deshpande A |
Conference Name | Proceedings of the 35th SIGMOD international conference on Management of data |
Date Published | 2009/// |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-60558-551-2 |
Keywords | caching, Indexing, inference queries, junction trees, Probabilistic databases |
Abstract | With large amounts of correlated probabilistic data being generated in a wide range of application domains including sensor networks, information extraction, event detection etc., effectively managing and querying them has become an important research direction. While there is an exhaustive body of literature on querying independent probabilistic data, supporting efficient queries over large-scale, correlated databases remains a challenge. In this paper, we develop efficient data structures and indexes for supporting inference and decision support queries over such databases. Our proposed hierarchical data structure is suitable both for in-memory and disk-resident databases. We represent the correlations in the probabilistic database using a junction tree over the tuple-existence or attribute-value random variables, and use tree partitioning techniques to build an index structure over it. We show how to efficiently answer inference and aggregation queries using such an index, resulting in orders of magnitude performance benefits in most cases. In addition, we develop novel algorithms for efficiently keeping the index structure up-to-date as changes (inserts, updates) are made to the probabilistic database. We present a comprehensive experimental study illustrating the benefits of our approach to query processing in probabilistic databases. |
URL | http://doi.acm.org/10.1145/1559845.1559894 |
DOI | 10.1145/1559845.1559894 |