Abstract:
Geez language is an ancient Semitic language predominantly used for religious texts in
Ethiopia and Eritrea with unique semantical, syntactical, and morphological
characteristics. This study explores the development of a Named Entity Recognition model
for the Geez language. By applying deep learning algorithms, the study addresses the lack
of NER solutions for Geez language. The research methodology includes dataset collection
from Ethiopian Orthodox Tewahido Church sources and corpus preparation for named
entity recognition. A dataset of 5,685 sentences was collected having a total of 27,154
words. From these, 10,326 unique words were identified. The dataset is then labeled with
27 tag sets. The study then undergoes text preprocessing, training, and testing.
Hyperparameter tuning using the DEAP framework is conducted to optimize model
performance. Two experiments were applied to the two deep learning algorithms. These
experiments are named entity recognition without and with part-of-speech information.
The experiments showed that LSTM without POS achieved 94.83%, BiLSTM without
POS achieved 95.24%, LSTM with POS scored 96.72%, and BiLSTM with POS achieved
a 97.14% accuracy level. Experimental results provide insights to the performance of Geez
NER. The study contributes to the advancement of natural language processing in Geez.
Keywords: Ge’ez Language, Ge’ez Named Entity Recognition, Deep Learning, LSTM,
BiLSTM.