Vast amounts of information–from newspapers, journals, legal transcripts, conference proceedings, correspondence, web pages, and other sources–have become increasingly accessible online. Yet current keyword-based search technologies offer little support to users searching for a few relevant text fragments among thousands of documents.
Open-domain question answering has recently emerged as a new field aimed at the extraction of brief, relevant answers from large text collections in response to written questions submitted by users. Individual related fields–such as natural language processing or information retrieval–do not allow for practicable solutions to open-domain question answering. For example, document retrieval alone is insufficient because relevant information is concentrated in document fragments that are small when compared to the size of the entire document. Advanced methods based on higher levels of text understanding cannot be applied directly to gigabyte-sized collections of unrestricted text. Similarly, the amount of knowledge required by an open-domain question answering system to act as an intelligent conversational agent is beyond the boundaries of present technologies.
This book presents the design of novel and robust methods for capturing the semantics of natural language questions and for finding relevant text snippets. The theoretical contributions of this research are reflected in a fully implemented architecture whose performance was evaluated within the DARPA-sponsored Text Retrieval Conference. In addition, experimental results show significant qualitative improvements with respect to the output from web search engines, revealing both the challenges and desired features of next-generation web search technologies.
Marius Pasca is Director of Question Answering Research and Development at Language Computer Corporation.
4/15/2003