Abstract:
Question Answering (QA) System provides answers to user questions by accessing its knowledge base. The generic QA framework involves processing asked questions to better understand the question, querying a knowledge base to retrieve passages with likely answers and finally extracting answers from retrieved passages using various Natural Language Processing techniques. The output of a QA system depends on the output of each of its phases as processing moves through the framework. A problem that remains is validating whether the retrieved passages from the passage retrieval module contain expected answers to asked questions. Also, answer extraction based on lexical and syntactic similarities alone is not enough coverage for scoring right answers in a QA framework. This work is therefore motivated by the effect of infusing validation techniques in QA framework, the need to validate paragraphs beyond the lexical level and finally the need to capture semantic variations in answer extraction techniques that are based on similarity features. In this thesis, a QA framework that validates entailment between asked questions and retrieved paragraphs is proposed. Also, four similarity scores which include word form, word order, distance and semantic similarity for Answer Extraction are implemented. Instant snippets returned by Google search engine are used as corpus to generate candidate answer sets. On a dataset of 1370 factoid questions, the proposed method achieved an accuracy of 77.71%, precision of 77.91%, recall of 91.37% and F1-measure of 91.37%. Our findings show that the inclusion of the validation technique help reduce the time spent by the system in analysing passages without possible answers and the semantic scorer help increased the precision at which right answers are identified in factoid QA systems. The proposed system can be adapted for use in automatic QA Systems and for grading factoid Computer Based Test