Abstract:
Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. To design and develop this thesis we have used python programming language for development, Whoosh searching and indexing engine for document retrieval, Orange3 data mining tool for EAT classification, and eclipse and anaconda for developing the script as an editor. The AFQA system comprises of indexing module, question analysis module, document analysis module and answer extraction and selection modules. The function of question analysis module is taking a user question as input and then generates a query, expands a query and determines its question keywords and expected answer type. The document analysis module performs the process of pre-processing of the document before indexing. Answer extraction also performs the detail analysis on the retrieved answer contents based on the expected answer type. For the document retrieval part, we have applied a probabilistic relevance IR model to retrieve sentences based on the probability of relevance of the query and documents pairs. In this case top sentence are used for the candidate answer bearing documents. In the answer extraction and selection module, we have used gazetteers and regular expression based pattern matching techniques to extract and select exact answers to be presented to the user. To evaluate our system, we have prepared 1500 question dataset and from this we have selected 450 for testing the components within the system. The experimental evaluation of this research shows that in the EAT classification SVM algorithm gives 99.7% of accuracy. The document retrieval component retrieves 94.4% of relevant sentences for the question. The answer selection part of the systems achieves a precision of 0.845, a recall of 0.935 and F1-score of 0.888. The result shows that high precision and recall was achieved. This comes from the classification accuracy of EAT results which contributes for this higher result. The other reason is that the document retrieval part retrieves more relevant sentence than irrelevant sentences. The overall accuracy of the AFQA system is 79.8%. The researcher recommends applying deep NLP tools, anaphora resolution techniques, facility of Word Sense Disambiguation, doing speech based question answering, and integrating Amharic spelling checkers to the system for future works. In general, our algorithms and tools have shown good performance compared with previous related research works.