Abstract | Question answering systems have become increasingly popular because they deliverusers short, succinct answers instead of overloading them with a large number of
irrelevant documents. The vast amount of information readily available on the World
Wide Web presents new opportunities and challenges for question answering. In order
for question answering systems to benefit from this vast store of useful knowledge, they
must cope with large volumes of useless data.
Many characteristics of the World Wide Web distinguish Web-based question answering
from question answering on closed corpora such as newspaper texts. The Web is vastly
larger in size and boasts incredible “data redundancy,” which renders it amenable to
statistical techniques for answer extraction. A data-driven approach can yield high levels
of performance and nicely complements traditional question answering techniques
driven by information extraction.
In addition to enormous amounts of unstructured text, the Web also contains pockets of
structured and semistructured knowledge that can serve as a valuable resource for
question answering. By organizing these resources and annotating them with natural
language, we can successfully incorporate Web knowledge into question answering
systems.
This tutorial surveys recent Web-based question answering technology, focusing on two
separate paradigms: knowledge mining using statistical tools and knowledge annotation
using database concepts. Both approaches can employ a wide spectrum of techniques
ranging in linguistic sophistication from simple “bag-of-words” treatments to full
syntactic parsing.
|