dc.description.abstract |
Spellchecking is the process of detecting and providing closer suggestions for incorrectly spelled
words with in a text. A spelling error can happen when people use text processing application to
produce electronic documents. Based on their context there are two types of spelling error non word and real word errors. Non-word error is an error where the word is misspelled and has no
meaning in the language. Real word error is a word that has meaning in the language but
contextually incorrect. For Ge’ez language both errors are the major problem in Ge’ez language
documents mainly in EOTC we have used Ge’ez for spiritual services the documents are written
by Ge’ez. The major issue here is to address non word spelling error problem for Ge’ez language.
Currently there is no spellchecker developed for Ge’ez language. In this study, an effective
spellchecker system is designed and then implemented using python programming language.
The proposed system architecture contains three components preprocessing which is tasked to
breaking down a given block of text into tokens, error detection component that verify the validity
of words and error correction component which is responsible to provide closer suggestion for the
invalid words and three backend components such as Ge’ez words dictionary, morphological rules
and Ge’ez character database. Dictionary lookup, N-gram analysis and morphology-based
approaches are used for error detection and also for correction. A morphology based approach is
appropriate because Ge’ez is morphologically rich language, we have design 1052 morphological
rules and suggestions are provided based on Damerau Levenshtin edit distance algorithm.
This study follow design science research methodology it entails six steps: problem identification
and motivation, setting goals, designing and developing, demonstrating, evaluating, and
communicating. The system were evaluated using evaluation metrics according to the
experimental results the system achieved Error Recall (84.3%), Error Precision (67.8%), Lexical
Recall (95.8%), Lexical Precision (98.3%) and Accuracy (94.7%) for detecting errors. and 76%
corrections are generated for correctly identified invalid words. The proposed system mainly focus
on typographic errors; so further improvement is essential to expand the work to handle real word
errors, and improve morphological rule definitions by adding more word classes.
Keywords:-Morphology, Non-word error, error detection, error correction, Spellchecker. |
en_US |