BDU IR

Designing a Hybrid Dimension Reduction for Improving Performance of Amharic News Document Classification

Show simple item record

dc.contributor.author Endalie, Demeke
dc.date.accessioned 2020-06-04T06:55:20Z
dc.date.available 2020-06-04T06:55:20Z
dc.date.issued 2020-05
dc.identifier.uri http://hdl.handle.net/123456789/10881
dc.description.abstract The main objective of natural language processing is to make computers perform tasks that require the involvement of human. This helps to save labor force, cost and time devoted to do such tasks. These goals are achieved by implementing activities such as text classification, speech recognition, and information retrieval. One of the natural language processing tasks is text classification. However, classification accuracy decreases and computational complexity increase as the number of categories increases. The aim of this study is to explore and design a dimensionality reduction scheme for Amharic document classification using feature selection and feature extraction. To achieve the objective, to design effective model and to know the state of the art, different literature were reviewed. Then designed dimension reduction scheme consists information gain, X-square and document frequency as feature selection with local thresholding and Principal Component Analysis (PCA) for further refinement of the selected feature. Software like NetBeans 8.1and Python were used to pre-process and design the artifact model respectively. Finally, the new dimension reduction scheme is evaluated by Amharic news document and achieves 82.77% accuracy. The new dimension reduction scheme is compared with the other dimensionality reduction system and feature merging strategies. As a result the new scheme reduces the number of features produced by information gain, X-square and document frequency by 64.07%, 74% and 50.63% respectively, and the training time increases only by 20 seconds as the amount of categories increase from three to thirteen. Even though, the proposed dimension reduction lowered the rate of increment of computational time, the classification accuracy still decreases at a decreasing rate, as we reduce the feature size to save the computational complexity. As a result, there is a need to apply genetic algorithms over the selected features since it determine the removal of the features by seeing the classification accuracy. en_US
dc.language.iso en en_US
dc.subject Computer Science en_US
dc.title Designing a Hybrid Dimension Reduction for Improving Performance of Amharic News Document Classification en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record