dc.description.abstract |
NLP has become the hottest area of research recently. Spell checkers are one of the many NLP
applications that are used to process texts and other human languages. Spelling errors are common
while writing Amharic documents. In order for the spelling errors to be detected and corrected, spell
checkers are made using different approaches of machine learning techniques. Because Amharic is
Ethiopia's official language, it is used in numerous documents. Even though, there are researches for
spell checker using Amharic language which used existing traditional rule-based methods and
algorithm, it is important to see other methods and approaches for better accuracy and achievement
while using deep learning methods. The context of a word in a phrase was not taken into account by
earlier attempts at Amharic spell checking, and deep learning methods were not used to solve the
problem. Due to the homophonic structure of the language and the meaning of the words, it is preferable
to check the context of the entire sentence before making any corrections. In this study context -based
spell checker system is presented using deep learning for Amharic language. Itattempts to design and
implement a spell checker that consider the meaning of words before and aftera word that supposed to
be corrected which is the current word. A deep learning dual-input model with the ability to consider
the context of the input word in boththe right and left branches was proposed in the study. Th e
experiment is conducted by setting up a baseline approach the edit distance. To carry out the
experiment, the same number of data andtraining environments are used. Accuracy and loss are
utilized to assess the model since the loss function is based on Sparse Categorical Cross-entropy. For
the system's training, testing, and evaluation, 116274 unique words were gathered from various
sources. The system was trained on Google Colaboratory using python language. It was tested by
inserting wrong words combined in sentence. It was able to give suggestions while detecting the wrong
word from the given sentence. In the experimental analysis, the proposed model achieved a lower loss
value and high accuracy than the baseline model edit distance. The edit distance achieved an accuracy
of 0.68. The proposed model which is dual input encoder achieved accuracy of 0.9349. In order to
minimize overcorrection and minimize loss, an optimization process is applied in the model training.
Therefore, it can be said that the context-based dual-input model is more effective than the baseline
method at spotting and fixing spelling problems. In the future, the system can be enhanced by utilizing
additional deep learning techniques and larger data corpora for better accuracy and reduced loss. Other
languages may also be used to accomplish this.
Keywords: Error Detection, Error Correction, Spell Checker, Amharic, Edit distance, Dual Input
Encoder |
en_US |