BDU IR

DEPARTMENT OF INFORMATION TECHNOLOGY PART OF SPEECH TAGGING FOR AWNGI LANGUAGE

Show simple item record

dc.contributor.author WONDMANEH, BIRHANE
dc.date.accessioned 2020-10-07T11:04:16Z
dc.date.available 2020-10-07T11:04:16Z
dc.date.issued 2020
dc.identifier.uri http://hdl.handle.net/123456789/11280
dc.description.abstract Natural Language Processing (NLP) has emerged as a means of increasing computers capability to understand natural languages, by which most of human knowledge is recorded. Part-of- Speech (POS) tagging is one of the tasks of NLP, which is used for labeling or classification on every word of a text with its correct part of speech category like noun, verb, adjective, adverb, preposition, conjunction etc. based on its definition and context of adjacent and related word. Awngi language is categorized under a Cushitic language family which is spoken by more than 1.5 million people in Amhara and some Parts of Benishangul Gumuz Regional states. This language has been one of the under-resourced languages both in terms of electronic resources and processing tools. In this regard, different natural language processing tasks are left for researchers for investigation. Among the different research areas of NLP, we used to focus on part of speech tagging since it is the primary and fundamental work. The output of POS tagging will be used as an input for grammar checker, spell chucker, information extraction, information retrieval, speech synthesis, parsing of text, semantic processing and e.t.c. The main motivation for this resource is to obtain data for training automatic taggers with machine learning approach. Hence, we take machine learning considerations into account during tagset design and present training experiments as part of this paper. Awngi language corpus is not available in organized manner. As a result, with the help of experts, we have prepared and tag the training data set manually. The data was collected from Injibara elementary and high school Awngi text books, from Amhara Mass Media Agency Awngi radio and Television program, as well as from Awi zone administration office. In order of reducing the complexity of part of speech tagging we have used the Hidden Markov model statistical approach. For the selected approach N –gram Viterbi algorithm is used for tagging purpose. For result simulation, we have used python programming language with a tenfold cross validation evaluation mechanism and finally the average accuracy of the tagger becomes 91.3% which is significant for the future researchers who want to investigate NLP researches on Awngi language in particular and on other local languages in general. en_US
dc.language.iso en en_US
dc.subject Information Technology en_US
dc.title DEPARTMENT OF INFORMATION TECHNOLOGY PART OF SPEECH TAGGING FOR AWNGI LANGUAGE en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record