Deep Learning Spell-Checker for  Amharic Language

Maryamawit, Shumetie

BDU IR Home
→
Bahir Dar Institute of Technology (BiT)
→
Faculty of Electrical and Computer Engineering
→
Electrical Engineering
→
thesis
→
View Item

dc.contributor.author	Maryamawit, Shumetie
dc.date.accessioned	2022-11-18T08:26:37Z
dc.date.available	2022-11-18T08:26:37Z
dc.date.issued	2022-07
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/14486
dc.description.abstract	NLP has become the hottest area of research recently. Spell checkers are one of the many NLP applications that are used to process texts and other human languages. Spelling errors are common while writing Amharic documents. In order for the spelling errors to be detected and corrected, spell checkers are made using different approaches of machine learning techniques. Because Amharic is Ethiopia's official language, it is used in numerous documents. Even though, there are researches for spell checker using Amharic language which used existing traditional rule-based methods and algorithm, it is important to see other methods and approaches for better accuracy and achievement while using deep learning methods. The context of a word in a phrase was not taken into account by earlier attempts at Amharic spell checking, and deep learning methods were not used to solve the problem. Due to the homophonic structure of the language and the meaning of the words, it is preferable to check the context of the entire sentence before making any corrections. In this study context -based spell checker system is presented using deep learning for Amharic language. Itattempts to design and implement a spell checker that consider the meaning of words before and aftera word that supposed to be corrected which is the current word. A deep learning dual-input model with the ability to consider the context of the input word in boththe right and left branches was proposed in the study. Th e experiment is conducted by setting up a baseline approach the edit distance. To carry out the experiment, the same number of data andtraining environments are used. Accuracy and loss are utilized to assess the model since the loss function is based on Sparse Categorical Cross-entropy. For the system's training, testing, and evaluation, 116274 unique words were gathered from various sources. The system was trained on Google Colaboratory using python language. It was tested by inserting wrong words combined in sentence. It was able to give suggestions while detecting the wrong word from the given sentence. In the experimental analysis, the proposed model achieved a lower loss value and high accuracy than the baseline model edit distance. The edit distance achieved an accuracy of 0.68. The proposed model which is dual input encoder achieved accuracy of 0.9349. In order to minimize overcorrection and minimize loss, an optimization process is applied in the model training. Therefore, it can be said that the context-based dual-input model is more effective than the baseline method at spotting and fixing spelling problems. In the future, the system can be enhanced by utilizing additional deep learning techniques and larger data corpora for better accuracy and reduced loss. Other languages may also be used to accomplish this. Keywords: Error Detection, Error Correction, Spell Checker, Amharic, Edit distance, Dual Input Encoder	en_US
dc.language.iso	en_US	en_US
dc.subject	Faculty of Electrical and Computer Engineering	en_US
dc.title	Deep Learning Spell-Checker for Amharic Language	en_US
dc.type	Thesis	en_US