DESIGNING GEEZ NEXT WORD PREDICTION MODEL USING STATISTICAL APPROACH

WONDIFRAW, MANAYE MAMO

dc.contributor.author	WONDIFRAW, MANAYE MAMO
dc.date.accessioned	2021-10-14T06:22:07Z
dc.date.available	2021-10-14T06:22:07Z
dc.date.issued	2020-08-13
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/12746
dc.description.abstract	we spend an increasing amount of time on their mobile and accessibility electronic devices. we insert data into the electronic device using input device and processing program. But, character by character inputting is time consuming and error prone. Text prediction is one of the techniques that facilitates data entry to computers and other devices. Word prediction is one of the text prediction system in user support system. Prediction of word can be used as an input for future researches and support other NLP applications such as assisting people with disability and mobile phone or PDA texting. Ge’ez is a Semitic language in which writings like histories and romances, legal, mathematical, and medical texts, ancient philosophy, tradition, history and knowledge of Ethiopia were written in Ge’ez. But these wittings are not accessible openly for any one since they are not digitalized. To digitalize Ge’ez language, it is time consuming since it works with key combination. The main objective of our research is to design Ge'ez word prediction model using a statistical machine learning approach specifically, n-gram language model. We follow design science methodology in this research work. We collect, preprocess and analyze morphologically dataset by Ge’ez language experts then we generate n-gram sequences. Stem/root n-gram sequences use to predict probable root or stem. Then, predict probable morphological features of predicted stem/root words using statistical information of affixes. Finally, based on the desired stem/root words and affixes, candidate surface words generated. We handled context and grammatical agreement by incorporating higher order n-grams and Ge’ez custom word vector. Word2vec check context and grammar and correct from the surrounding neighbor words when user provide space. We perform evaluation using test data using keystroke savings as evaluation metrics. To evaluate the prediction model, we conduct three experiments: experiment 1, experiment 2 and experiment 3. We conduct the first and the second experiment using morphologically tagged dataset with smaller and larger data size respectively. Whereas we conduct experiment 3 without considering morphological features. We achieve the best result from experiment 2, which is 35.7% keystroke savings for hybrid of n-gram sequences with back off smoothing model.	en_US
dc.language.iso	en_US	en_US
dc.subject	INFORMATION TECHNOLOGY	en_US
dc.title	DESIGNING GEEZ NEXT WORD PREDICTION MODEL USING STATISTICAL APPROACH	en_US
dc.type	Thesis	en_US