Multi-Class Classification of Racism in Amharic Text Using Machine Learning

Abebe, Desie

dc.contributor.author	Abebe, Desie
dc.date.accessioned	2024-12-05T07:11:28Z
dc.date.available	2024-12-05T07:11:28Z
dc.date.issued	2024-03-21
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/16269
dc.description.abstract	Currently, the use of Facebook and Telegram social media platforms has become a prevalent method of communication worldwide. However, these platforms can attract both necessary and unnecessary interactions, and the anonymity, flexibility, and creation of fake accounts with different user names and identities contribute to the posting and commenting of racist content. This study aims to address this issue by employing deep learning and supervised machine learning approaches to classify racist comments and posts in Amharic on Facebook and Telegram in case more users participate. The data for this study 13015 instances were collected using FacePager, Beautifulsoup, and data exporting techniques. The collected data was annotated based on a guideline into classifies such as non-racist, individual racist, regional racist, and country racist. The study found that using deep learning models like long-term memory (LSTM) and bi-directional long-term memory (BI-LSTM), along with supervised machine learning models such as decision tree (DT), support vector machine (SVM), naïve Bayes (NB), and k-nearest neighbor (KNN), proved to be effective algorithms for data classification. Additionally, word2vec was utilized for feature extraction to represent each word as a unique vector. This approach successfully aided in the understanding and classification of large amounts of data. The dataset was split into 80% for model training, 10% for testing, and 10% for validation. The study selected the best hyper parameters to construct a better model for racist text posts and comments. Experimental results show that the BI-LSTM model registered the best accuracy of 96%. So Bi-LSTM model was a better multi-class racist text classification model than the other. However, the study does not address the text classified in sentence labels. Based on these findings, it is recommended to prepare an Amharic language racist text dataset for social media for the multi-label classification of racist text. Keywords: Racism; Amharic Text Posts and Comments; Multi-Class Classification; Deep Learning; Supervised Machine Learning; Social Media	en_US
dc.language.iso	en_US	en_US
dc.subject	Computer Science	en_US
dc.title	Multi-Class Classification of Racism in Amharic Text Using Machine Learning	en_US
dc.type	Thesis	en_US