Abstract:
The Ge'ez language, an ancient Ethiopian Semitic language still used in liturgical contexts and taught at university and college levels, lacks tools for part-of-speech tagging, morphological analyzers, and syntax error detection in written texts. This hinders the identification of syntax errors and poses a significant challenge for learners, and researchers. This study addresses this problem by developing a morphological part of speech tagging and syntax error detection models for Ge'ez using deep learning approaches. To develop the model, a dataset of 4,981 sentences that have 30326 words and 11,747 unique words was collected for part-of-speech tagging. Additionally, a dataset of 1,170 sentences was collected for syntax error detection. LSTM and BiLSTM algorithms were used to develop the models. The LSTM model achieved an accuracy of 94% and 92.31% in the Gz-POS and Gz-SED tasks, respectively, and the BiLSTM model achieved an accuracy of 95.01% and 94.02% in the Gz-POS and Gz-SED. The BiLSTM model outperformed the LSTM model with some accuracy differences. The results demonstrate the effectiveness of deep learning algorithms for part-of-speech tagging and syntax error detection in the Ge'ez language. The developed models provide a feasible solution to the challenges of digitizing Ge'ez books, and a help for second-language learners. The findings contribute to the improvement of language education, research, and development in under-resourced languages. Future researchers can use the developed models and methodology as a framework for further advancements in Ge'ez language processing.
Keywords: - Ge’ez Morphological POS Tagging, Ge’ez Syntax Error Detection, Deep Learning, LSTM, BiLSTM.