Amharic Question Generation from Amharic legal Text  Documents by Using Deep Learning Approach

FIKIR, SETIE TEZERA

dc.contributor.author	FIKIR, SETIE TEZERA
dc.date.accessioned	2022-11-16T07:34:25Z
dc.date.available	2022-11-16T07:34:25Z
dc.date.issued	2022-07
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/14385
dc.description.abstract	Questioning is the main tool to grasp knowledge in our day-to-day activities. But the manual construction of questions is time-consuming, expensive, and needs experts in the area. So, developing automatic question generation can reduce the time to construct and the need for human labor. Numerous studies have been done on question generation in full resource languages like English, Chinese, and others using various recent techniques. However, two works are being done on the Amharic question generation problem using traditional approaches including rule- and template-based. It needs hand crafted rules and templates to train the model which is so time consuming and tedious and also the performance of the model heavy depend on the size and quality of rules and template to training. So, it is not the effective for big data. Also, there is no available question generation data set to overcome thus problems, this study uses deep learning for Amharic question generation problem, a new method of neural network with attached rules to address the aforementioned issues. Since Amharic is a low-resource language for NLP we construct rules to generate questions. To do this we consider the use of tokenization, normalization, stop word removal, and stemming then feed it to the deep learning model which is CNN, LSTM, and Bi-LSTM to generate questions based on the given input. Training data is prepared manually which is so tedious and time-consuming because there is no available Question Generation training dataset. It is about 6100 Question-Answer and paired with five classes. The class depicts as ((0) for how much, (1) who, (2) what, (3) when, (4) where, and (5) for others. Accuracy, precision, F-measure, and confusion matrix are performance measures that are used to assess the model's overall effectiveness when applied to the provided dataset. According to performance measurement, in this study's third trial, LSTM, CNN, and Bi-LSTM the maximum achieved accuracy rates of 92%, 94%, and 95 %. The results showed that the proposed Bi-LSTM overcame the challenges of Amharic question generation better than the other two models. Key words: Amharic, Question Generator, Deep Learning, word2vec, and Natural Language Processing.	en_US
dc.language.iso	en_US	en_US
dc.subject	FACULTY OF COMPUTING	en_US
dc.title	Amharic Question Generation from Amharic legal Text Documents by Using Deep Learning Approach	en_US
dc.type	Thesis	en_US