Abstract:
Text classification is a technique which classifies textual information into a predefined set of categories. With the continuously increasing amount of online information, there is a pressing need to structure information. Automatic text classification is an inevitable solution in this regard. However, the present approaches are multi label text classification study aims to achieve news text classification and goal is to and out if it's possible to use a machine learning approach construct a classification system that can be used to extract features automatically.
The main objective of natural language processing is to make computers perform tasks that require the involvement of human.in order to solve labor force, cost and time devoted to do such tasks. These goals are achieved by implementing activities such as text classification, sentiment analysis, entity recognition and information retrieval. However, classification accuracy decreases and computational complexity increase as the number of categories increases specially in single label text classification. The aim of this study is to develop design and model scheme for multi label Amharic text classification using convolutional neural networks.
To achieve the objective, able to design effective model and to know the state of art, different literature was reviewed. Then designed multi label Amharic text classification using CNN consists preparing word embedding and by accepting the input vectors and token from embedding construct cnn model by setting parameters. The word embeddings are used individually and in various combinations through different channels of CNN a single dense layer with six outputs with a sigmoid activation functions and binary cross entropy loss functions to predict class labels. In order to handle such issues, we were used software Python used to pre-process and design the artifact model.
Finally, the multi label Amharic text classification model achieves an accuracy of 97.69%. The proposed model is pretrained embedding for CNN compared to other word2vec files that are not pre-trained embedding, testing for specific data and labels in typically six class datasets are used.