Deconstructing nuggets: the stability and reliability of complex question answering evaluation

Title	Deconstructing nuggets: the stability and reliability of complex question answering evaluation
Publication Type	Conference Papers
Year of Publication	2007
Authors	Jimmy Lin, Zhang P
Conference Name	Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Date Published	2007///
Publisher	ACM
Conference Location	New York, NY, USA
ISBN Number	978-1-59593-597-7
Keywords	complex information needs, human judgments, trec
Abstract	A methodology based on "information nuggets" has recently emerged as the de facto standard by which answers to complex questions are evaluated. After several implementations in the TREC question answering tracks, the community has gained a better understanding of its many characteristics. This paper focuses on one particular aspect of the evaluation: the human assignment of nuggets to answer strings, which serves as the basis of the F-score computation. As a byproduct of the TREC 2006 ciQA task, identical answer strings were independently evaluated twice, which allowed us to assess the consistency of human judgments. Based on these results, we explored simulations of assessor behavior that provide a method to quantify scoring variations. Understanding these variations in turn lets researchers be more confident in their comparisons of systems.
URL	http://doi.acm.org/10.1145/1277741.1277799
DOI	10.1145/1277741.1277799

Publications