Abstract:
The Question Answering System (QAS) is a tool that enables users to ask questions in natural language and receive concise, accurate answers from a database of materials. To find knowledge on a subject that interests them, many people use the internet and other informational platforms. This study focuses on factual categories, such as name, location, number, and time. The Question Answering System (QAS) is a system that enables users to ask natural language questions and receive accurate answers from a pool of documents. Internet users frequently use information platforms to research topics that interests to them. Some highly-resourced languages like English, as well as a few under-resourced languages such as Amharic and Geez, have developed question answering systems. However, these systems do not work for Awugni, which has unique syntax and linguistic features. To address this issue, additional manual feature engineering, external sources, or linguistic tools are required. To solve this problem, the researchers have proposed Awugni factoid question answering (AFQA), which uses a deep learning approach that can learn features through training from documents. The Awugni Factoid Question Answering System is proposed with seven main components. Data preprocessing, word embedding, LSTM/BiLSTM model building, model training, question classification, cosine similarity computation, and answer extraction. Our data was collected from various sources such as the Amhara mass media/Awigna website, social media platforms (news from Telegram and Facebook), and Bible (Orit lewawiyan), using the document analysis data collection method. The collected data requires a sequential model, so we utilized BILSTM and LSTM language models for building our system. We trained all of the models with 10 epochs with a batch size of 8 and with 128 hidden sizes each. Our dataset consists of around 40,000 sentences gathered from the above sources. Using this data, we created over 3,500 question-answer pairs as datasets, which were then divided into an 80:20 ratios for training/validation and testing. Specifically, 2,800 and 700 data sets were used for training/validation and testing, respectively.95.94 % & 97. 85% accuracy was achieved by the LSTM & BILSTM during testing; hence, the best-performing model is BILSTM for AFQAS.
Keywords: Deep Learning, Awugni Factoid questions, Question classification, Question analysis, Document Analysis, Answer Extraction