BDU IR

DESIGNING GEEZ NEXT WORD PREDICTION MODEL USING STATISTICAL APPROACH

Show simple item record

dc.contributor.author WONDIFRAW, MANAYE MAMO
dc.date.accessioned 2021-10-14T06:22:07Z
dc.date.available 2021-10-14T06:22:07Z
dc.date.issued 2020-08-13
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/12746
dc.description.abstract we spend an increasing amount of time on their mobile and accessibility electronic devices. we insert data into the electronic device using input device and processing program. But, character by character inputting is time consuming and error prone. Text prediction is one of the techniques that facilitates data entry to computers and other devices. Word prediction is one of the text prediction system in user support system. Prediction of word can be used as an input for future researches and support other NLP applications such as assisting people with disability and mobile phone or PDA texting. Ge’ez is a Semitic language in which writings like histories and romances, legal, mathematical, and medical texts, ancient philosophy, tradition, history and knowledge of Ethiopia were written in Ge’ez. But these wittings are not accessible openly for any one since they are not digitalized. To digitalize Ge’ez language, it is time consuming since it works with key combination. The main objective of our research is to design Ge'ez word prediction model using a statistical machine learning approach specifically, n-gram language model. We follow design science methodology in this research work. We collect, preprocess and analyze morphologically dataset by Ge’ez language experts then we generate n-gram sequences. Stem/root n-gram sequences use to predict probable root or stem. Then, predict probable morphological features of predicted stem/root words using statistical information of affixes. Finally, based on the desired stem/root words and affixes, candidate surface words generated. We handled context and grammatical agreement by incorporating higher order n-grams and Ge’ez custom word vector. Word2vec check context and grammar and correct from the surrounding neighbor words when user provide space. We perform evaluation using test data using keystroke savings as evaluation metrics. To evaluate the prediction model, we conduct three experiments: experiment 1, experiment 2 and experiment 3. We conduct the first and the second experiment using morphologically tagged dataset with smaller and larger data size respectively. Whereas we conduct experiment 3 without considering morphological features. We achieve the best result from experiment 2, which is 35.7% keystroke savings for hybrid of n-gram sequences with back off smoothing model. en_US
dc.language.iso en_US en_US
dc.subject INFORMATION TECHNOLOGY en_US
dc.title DESIGNING GEEZ NEXT WORD PREDICTION MODEL USING STATISTICAL APPROACH en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record