dc.description.abstract |
Questioning is the main tool to grasp knowledge in our day-to-day activities. But the manual
construction of questions is time-consuming, expensive, and needs experts in the area. So,
developing automatic question generation can reduce the time to construct and the need for human
labor. Numerous studies have been done on question generation in full resource languages like
English, Chinese, and others using various recent techniques. However, two works are being done
on the Amharic question generation problem using traditional approaches including rule- and
template-based. It needs hand crafted rules and templates to train the model which is so time
consuming and tedious and also the performance of the model heavy depend on the size and quality
of rules and template to training. So, it is not the effective for big data. Also, there is no available
question generation data set to overcome thus problems, this study uses deep learning for Amharic
question generation problem, a new method of neural network with attached rules to address the
aforementioned issues. Since Amharic is a low-resource language for NLP we construct rules to
generate questions. To do this we consider the use of tokenization, normalization, stop word
removal, and stemming then feed it to the deep learning model which is CNN, LSTM, and Bi-LSTM
to generate questions based on the given input. Training data is prepared manually which is so
tedious and time-consuming because there is no available Question Generation training dataset.
It is about 6100 Question-Answer and paired with five classes. The class depicts as ((0) for how
much, (1) who, (2) what, (3) when, (4) where, and (5) for others. Accuracy, precision, F-measure,
and confusion matrix are performance measures that are used to assess the model's overall
effectiveness when applied to the provided dataset. According to performance measurement, in
this study's third trial, LSTM, CNN, and Bi-LSTM the maximum achieved accuracy rates of 92%,
94%, and 95 %. The results showed that the proposed Bi-LSTM overcame the challenges of
Amharic question generation better than the other two models.
Key words: Amharic, Question Generator, Deep Learning, word2vec, and Natural Language
Processing. |
en_US |