BDU IR

AN INTELLIGENT SYSTEM FOR AUTOMATED AMHARIC TEXT CATEGORIZATION USING MACHINE LEARNING TECHNIQUES

Show simple item record

dc.contributor.author BESHAHUWIRED, NIGUS
dc.date.accessioned 2020-03-24T05:52:08Z
dc.date.available 2020-03-24T05:52:08Z
dc.date.issued 2020-03-24
dc.identifier.uri http://hdl.handle.net/123456789/10778
dc.description.abstract Text categorization (classification) is the process of classifying documents into a predefined set of one or more categories based on their content. In this thesis, an intelligent Amharic text categorization system is presented. Document classification is challenging as the number of discriminating words can be very large. Machine learning techniques are used in this system. Amharic language is a Semitic language that has much and complex morphology than English. It needs a set of preprocessing routines for manipulation. Stop words like prepositions, conjunctions and particles are considered insignificant words must be removed. Affix removal algorithm is used for stemming to reduce dimensionally and document is represented as a weighted vector. The experiments performed 1514 news documents collected from selected available websites. To build the classification models for Amharic documents, NB, J48, SMO classifiers are used. Based on experimental results, the SMO support vector machine outperforms model constructed using either NB or J48, on TF_IDF feature selection schemes. Quantitatively, the best results 88.8% accuracy, 89.0% precision and 88.8% recall are accomplished for TF_IDF feature selection using the SMO support vector machine. This shows a promising result to investigate an appropriate Amharic text categorization model using SMO Support Vector Machine. en_US
dc.language.iso en en_US
dc.subject Computer Science en_US
dc.title AN INTELLIGENT SYSTEM FOR AUTOMATED AMHARIC TEXT CATEGORIZATION USING MACHINE LEARNING TECHNIQUES en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record