Understanding tuberculosis epidemiology using structured statistical models

TitleUnderstanding tuberculosis epidemiology using structured statistical models
Publication TypeJournal Articles
Year of Publication2004
AuthorsGetoor L, Rhee JT, Koller D, Small P
JournalArtificial Intelligence in Medicine
Volume30
Issue3
Pagination233 - 256
Date Published2004/03//
ISBN Number0933-3657
KeywordsBayesian networks, epidemiology, Probabilistic and statistical relational models, Tuberculosis
Abstract

Molecular epidemiological studies can provide novel insights into the transmission of infectious diseases such as tuberculosis. Typically, risk factors for transmission are identified using traditional hypothesis-driven statistical methods such as logistic regression. However, limitations become apparent in these approaches as the scope of these studies expand to include additional epidemiological and bacterial genomic data. Here we examine the use of Bayesian models to analyze tuberculosis epidemiology. We begin by exploring the use of Bayesian networks (BNs) to identify the distribution of tuberculosis patient attributes (including demographic and clinical attributes). Using existing algorithms for constructing BNs from observational data, we learned a BN from data about tuberculosis patients collected in San Francisco from 1991 to 1999. We verified that the resulting probabilistic models did in fact capture known statistical relationships. Next, we examine the use of newly introduced methods for representing and automatically constructing probabilistic models in structured domains. We use statistical relational models (SRMs) to model distributions over relational domains. SRMs are ideally suited to richly structured epidemiological data. We use a data-driven method to construct a statistical relational model directly from data stored in a relational database. The resulting model reveals the relationships between variables in the data and describes their distribution. We applied this procedure to the data on tuberculosis patients in San Francisco from 1991 to 1999, their Mycobacterium tuberculosis strains, and data on contact investigations. The resulting statistical relational model corroborated previously reported findings and revealed several novel associations. These models illustrate the potential for this approach to reveal relationships within richly structured data that may not be apparent using conventional statistical approaches. We show that Bayesian methods, in particular statistical relational models, are an important tool for understanding infectious disease epidemiology.

URLhttp://www.sciencedirect.com/science/article/pii/S0933365703001337
DOI10.1016/j.artmed.2003.11.003