DESIGNING NEXT PHRASE PREDICTION MODEL FOR AMHARIC LANGUAGE USING DEEP LEARNING TECHNIQUES

WELELA, AMESALU

DESIGNING NEXT PHRASE PREDICTION MODEL FOR AMHARIC LANGUAGE USING DEEP LEARNING TECHNIQUES

WELELA, AMESALU

URI: http://ir.bdu.edu.et/handle/123456789/16473

Date: 2024-02

Abstract:

Text entry is an essential aspect of human-computer interaction and can be performed through a keyboard, which mostly contains English letters. Typing Amharic text on a computer system may pose challenges like decreased typing speed, spelling, and grammar errors. These challenges allow to introduce of text prediction that facilitates fast entry of text into computers and handheld devices. Previous studies about Amharic next-word prediction lacked syntactic agreement due to inaccurate part-of-speech tagging. Additionally, a single word did not capture the context of the sentences. This study aims to design next phrase prediction model using deep learning approaches. The dataset for the prediction model was collected from Amharic student textbooks, Amharic teacher's guidebooks, Amharic Grammar entitled የአማርኛ ሰዋሰው by Baye Yimam, and news from Amhara mass media. The collected Amharic sentences required preprocessing, part of speech tagged with a pre-trained model, and a rule-based chunk tagged for the model development. Two prediction models were designed using LSTM and Encoder-Decoder deep learning techniques to compare and select the optimum one. The prediction models are trained using 2176 simple declarative sentences with split ratios of 80%, 10%, 10%, and 70%, 15%, and 15% for training, validation, and, testing sets. The accuracy of proposed models achieved 68.8%, and 70.4% in Encoder Decoder and LSTM respectively on the former split ratio. The LSTM model performs better than the Encoder-decoder model with a split of 80%, 10%, and 10% for training, testing, and validation sets. The finding of this study has a valuable role, especially for non–native and dyslexia users in typing coherent sentences as well as capturing the context of sentences by considering sequences of words rather than individual words. This study was limited to declarative sentences and syntactic information, which leads future researchers to encompass other types of sentences with semantic meanings. Keywords: Phrase prediction, deep learning, Long Short Term Memory, Encoder- Decoder, sentence chunk

Show full item record