Designing Amharic Question Answering model for Healthcare Using  Deep Learning Approach

Abreham, Alene Taye

Designing Amharic Question Answering model for Healthcare Using Deep Learning Approach

Abreham, Alene Taye

URI: http://ir.bdu.edu.et/handle/123456789/14782

Date: 2022-11

Abstract:

The Question Answering System (QAS) is a system that allows users to ask natural language questions about a topic and obtain precise answers from a collection of documents. Many people utilize internet health information platforms to look up information on symptoms, diseases, or any other health-related topic that interests them. Within closed domain Amharic Healthcare Question Answering System (AHQAS), this thesis focuses on both fact and nonfact question kinds; Definition, Description, Factoid numeric, and List. There are some researchers conducted previously on Amharic QAS. Most of the researches used only rule based and machine learning methods, with semantic representation limitations and the usage of highly engineered feature extraction, but the potential of deep learning has not been exploited yet. By extracting features automatically and representing words semantically in sparse vectors, the newly developed deep learning approach and the usage of word embedding increases the performance of QAS. The aim of this study is to improve the performance of Amharic QAS using deep learning and pre trained Amharic word2vec model. In order to perform AHQAS using deep learning and pre trained Amharic word2vec model, we design a system having seven main components: Data preprocessing, word embedding, creating LSTM/BiLSTM model, training the model, question type classification, calculate cosine similarity score and extracting answers. The problem of identifying training questions that are semantically equivalent to the requested ones is investigated in this thesis, with the assumption that the answers to similar questions should similarly answer the new ones. If a question with a relevant answer is identified, the answer can be returned as a relevant response to the new query. The similarity between the new and training questions was calculated using the cosine similarity. Hellodoctorethiopia, Doctor Alle, healthinfoamharic, medicinenet, Dradugnaw, and Experts provided us with 1800 Amharic Healthcare relevant QA dataset pairings. The dataset is separated into two sections: training, and testing. For question type classification task Bi-LSTM classified about 98.1% of the question correctly, which outperforms the rule based and ML approaches. The performance of question answering achieved f-score of 87.5% in retrieving correct answers. Based on the results, Bi-LSTM can be used as question classification method for QAS and word embeddings generated by neural networks can replace manually designed features which is an important advantage. In our study the user writes questions on keyboards to access answer from the system. But some people may not write Amharic instead it may easy to use voices. And also our model does not assist people who have difficulty using their hands or having visual impairment to access information. Therefore, for future work it is better to develop a system that accepts speech based healthcare related questions from the user. Keywords: Question Answering, Deep Learning, natural language preprocessing, LSTM, BiLSTM, Question Type Classification, Answer Extraction.

Show full item record