BDU IR

Amharic Question Generation from Amharic legal Text Documents by Using Deep Learning Approach

Show simple item record

dc.contributor.author FIKIR, SETIE TEZERA
dc.date.accessioned 2022-11-16T07:34:25Z
dc.date.available 2022-11-16T07:34:25Z
dc.date.issued 2022-07
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/14385
dc.description.abstract Questioning is the main tool to grasp knowledge in our day-to-day activities. But the manual construction of questions is time-consuming, expensive, and needs experts in the area. So, developing automatic question generation can reduce the time to construct and the need for human labor. Numerous studies have been done on question generation in full resource languages like English, Chinese, and others using various recent techniques. However, two works are being done on the Amharic question generation problem using traditional approaches including rule- and template-based. It needs hand crafted rules and templates to train the model which is so time consuming and tedious and also the performance of the model heavy depend on the size and quality of rules and template to training. So, it is not the effective for big data. Also, there is no available question generation data set to overcome thus problems, this study uses deep learning for Amharic question generation problem, a new method of neural network with attached rules to address the aforementioned issues. Since Amharic is a low-resource language for NLP we construct rules to generate questions. To do this we consider the use of tokenization, normalization, stop word removal, and stemming then feed it to the deep learning model which is CNN, LSTM, and Bi-LSTM to generate questions based on the given input. Training data is prepared manually which is so tedious and time-consuming because there is no available Question Generation training dataset. It is about 6100 Question-Answer and paired with five classes. The class depicts as ((0) for how much, (1) who, (2) what, (3) when, (4) where, and (5) for others. Accuracy, precision, F-measure, and confusion matrix are performance measures that are used to assess the model's overall effectiveness when applied to the provided dataset. According to performance measurement, in this study's third trial, LSTM, CNN, and Bi-LSTM the maximum achieved accuracy rates of 92%, 94%, and 95 %. The results showed that the proposed Bi-LSTM overcame the challenges of Amharic question generation better than the other two models. Key words: Amharic, Question Generator, Deep Learning, word2vec, and Natural Language Processing. en_US
dc.language.iso en_US en_US
dc.subject FACULTY OF COMPUTING en_US
dc.title Amharic Question Generation from Amharic legal Text Documents by Using Deep Learning Approach en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record