Abstract:
The Question Answering System (QAS) is a system that allows users to ask natural language
questions about a topic and obtain precise answers from a collection of documents. Many
people utilize internet health information platforms to look up information on symptoms,
diseases, or any other health-related topic that interests them. Within closed domain Amharic
Healthcare Question Answering System (AHQAS), this thesis focuses on both fact and nonfact question kinds; Definition, Description, Factoid numeric, and List. There are some
researchers conducted previously on Amharic QAS. Most of the researches used only rule based
and machine learning methods, with semantic representation limitations and the usage of highly
engineered feature extraction, but the potential of deep learning has not been exploited yet. By
extracting features automatically and representing words semantically in sparse vectors, the
newly developed deep learning approach and the usage of word embedding increases the
performance of QAS. The aim of this study is to improve the performance of Amharic QAS
using deep learning and pre trained Amharic word2vec model. In order to perform AHQAS
using deep learning and pre trained Amharic word2vec model, we design a system having seven
main components: Data preprocessing, word embedding, creating LSTM/BiLSTM model,
training the model, question type classification, calculate cosine similarity score and extracting
answers. The problem of identifying training questions that are semantically equivalent to the
requested ones is investigated in this thesis, with the assumption that the answers to similar
questions should similarly answer the new ones. If a question with a relevant answer is
identified, the answer can be returned as a relevant response to the new query. The similarity
between the new and training questions was calculated using the cosine similarity.
Hellodoctorethiopia, Doctor Alle, healthinfoamharic, medicinenet, Dradugnaw, and Experts
provided us with 1800 Amharic Healthcare relevant QA dataset pairings. The dataset is
separated into two sections: training, and testing. For question type classification task Bi-LSTM
classified about 98.1% of the question correctly, which outperforms the rule based and ML
approaches. The performance of question answering achieved f-score of 87.5% in retrieving
correct answers. Based on the results, Bi-LSTM can be used as question classification method
for QAS and word embeddings generated by neural networks can replace manually designed
features which is an important advantage. In our study the user writes questions on keyboards
to access answer from the system. But some people may not write Amharic instead it may easy
to use voices. And also our model does not assist people who have difficulty using their hands
or having visual impairment to access information. Therefore, for future work it is better to
develop a system that accepts speech based healthcare related questions from the user.
Keywords: Question Answering, Deep Learning, natural language preprocessing, LSTM,
BiLSTM, Question Type Classification, Answer Extraction.