Abstract:
Text entry is an essential aspect of human-computer interaction and can be performed
through a keyboard, which mostly contains English letters. Typing Amharic text on a
computer system may pose challenges like decreased typing speed, spelling, and grammar
errors. These challenges allow to introduce of text prediction that facilitates fast entry of
text into computers and handheld devices. Previous studies about Amharic next-word
prediction lacked syntactic agreement due to inaccurate part-of-speech tagging.
Additionally, a single word did not capture the context of the sentences. This study aims to
design next phrase prediction model using deep learning approaches. The dataset for the
prediction model was collected from Amharic student textbooks, Amharic teacher's
guidebooks, Amharic Grammar entitled የአማርኛ ሰዋሰው by Baye Yimam, and news from
Amhara mass media. The collected Amharic sentences required preprocessing, part of
speech tagged with a pre-trained model, and a rule-based chunk tagged for the model
development. Two prediction models were designed using LSTM and Encoder-Decoder
deep learning techniques to compare and select the optimum one. The prediction models
are trained using 2176 simple declarative sentences with split ratios of 80%, 10%, 10%,
and 70%, 15%, and 15% for training, validation, and, testing sets. The accuracy of
proposed models achieved 68.8%, and 70.4% in Encoder Decoder and LSTM respectively
on the former split ratio. The LSTM model performs better than the Encoder-decoder
model with a split of 80%, 10%, and 10% for training, testing, and validation sets. The
finding of this study has a valuable role, especially for non–native and dyslexia users in
typing coherent sentences as well as capturing the context of sentences by considering
sequences of words rather than individual words. This study was limited to declarative
sentences and syntactic information, which leads future researchers to encompass other
types of sentences with semantic meanings.
Keywords: Phrase prediction, deep learning, Long Short Term Memory, Encoder-
Decoder, sentence chunk