BDU IR

Multi-Class Classification of Racism in Amharic Text Using Machine Learning

Show simple item record

dc.contributor.author Abebe, Desie
dc.date.accessioned 2024-12-05T07:11:28Z
dc.date.available 2024-12-05T07:11:28Z
dc.date.issued 2024-03-21
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16269
dc.description.abstract Currently, the use of Facebook and Telegram social media platforms has become a prevalent method of communication worldwide. However, these platforms can attract both necessary and unnecessary interactions, and the anonymity, flexibility, and creation of fake accounts with different user names and identities contribute to the posting and commenting of racist content. This study aims to address this issue by employing deep learning and supervised machine learning approaches to classify racist comments and posts in Amharic on Facebook and Telegram in case more users participate. The data for this study 13015 instances were collected using FacePager, Beautifulsoup, and data exporting techniques. The collected data was annotated based on a guideline into classifies such as non-racist, individual racist, regional racist, and country racist. The study found that using deep learning models like long-term memory (LSTM) and bi-directional long-term memory (BI-LSTM), along with supervised machine learning models such as decision tree (DT), support vector machine (SVM), naïve Bayes (NB), and k-nearest neighbor (KNN), proved to be effective algorithms for data classification. Additionally, word2vec was utilized for feature extraction to represent each word as a unique vector. This approach successfully aided in the understanding and classification of large amounts of data. The dataset was split into 80% for model training, 10% for testing, and 10% for validation. The study selected the best hyper parameters to construct a better model for racist text posts and comments. Experimental results show that the BI-LSTM model registered the best accuracy of 96%. So Bi-LSTM model was a better multi-class racist text classification model than the other. However, the study does not address the text classified in sentence labels. Based on these findings, it is recommended to prepare an Amharic language racist text dataset for social media for the multi-label classification of racist text. Keywords: Racism; Amharic Text Posts and Comments; Multi-Class Classification; Deep Learning; Supervised Machine Learning; Social Media en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title Multi-Class Classification of Racism in Amharic Text Using Machine Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record