Abstract:
This study aims to explore the translation of Ge’ez-Amharic machine translation using RNN. The two morphological language i.e. Geez and Amharic is difficult for translation as a word level. Data scarcity and unavailability of the well-prepared corpus is a challenge for these languages. And, at word level, it is difficult to manage many forms of a single word, not specific and lacks consistency.
To conduct the experimentation, morpheme based sentence level parallel corpus was collected and prepared from online sources. Such Online sources include Old Testament of Holy bible, kidassie or Anaphora (የሐዋርያት እና የዲዮስቆሮስ ቅዳሴ), Psalms (መዝሙረ-ዳዊት), prepared by the Ge’ez Experts and differ Ge’ez education books by typing manually. To make the corpus suitable for the system, different preprocessing tasks i.e. cleaning and normalization have been done. The data set prepared a total of 11,133 simple and complex morphemes based sentences, for those prepared sentences 2,000 data were used for our model, out of which 80% for training, 10% is used for validation and the remaining 10% used for testing data.
The encoder-decoder network in the NMT architecture was designed with long short-term memory (LSTM) networks. We use LSTM model and the evaluation of the obtained models was performed automatically. For automatic evaluation, the bilingual evaluation under- study (BLEU) was used. After preparing and designing the prototype and the corpus, different experiments were conducted. Adam and RMSProp optimizer with Sigmoid and Softmax activation function in each optimizer was demonstrated. The better model accuracy performance is scored in RMSProp with sigmoid activation function in 200 epochs. The test accuracy result is scored 98.33% and the loss function result scores 0.068.