Amharic Racism Text Detection from The Writer Using Machine learning approach

Dereje, Sintayehu Aweke

Amharic Racism Text Detection from The Writer Using Machine learning approach

Dereje, Sintayehu Aweke

URI: http://ir.bdu.edu.et/handle/123456789/15402

Date: 2022-10

Abstract:

Amharic is one of the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic written documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are broad indigenous knowledge. The written documents in Amharic have their unique racism-related ideas, depending on the writer's preferences. Racism is currently the burning issue in Ethiopia. Racism is not specifically tackled in Amharic Offline and Online textual, especially by employing scientific studies and approaches. This study uses machine learning techniques to create a model for identifying racist text in Amharic in both online and offline sources. The dataset is structured using text from online and offline sources to create documents. The use of preprocessing and post possessing helps to reduce unclear outcomes. The study used an experimental methodology to tackle an issue by applying various features extraction, algorithms, and models. Research was utilized to extract the TF-IDF feature, TF-IDF combines n-grams and a bag of words. SVM, BILSTM, and RF were employed for classification. Each experiment was analyzed using a 5-fold cross-validation method. In this study, a dataset is used to apply a feature extraction technique for classification. The dataset was divided into categories like racism and non-racism. The experiment based on SVM classification and feature extraction using a bag of words achieves great performance to compare RF classification. In this work, an experimental performance result of 89.33% on an Amharic document was obtained. In the BILSTM experiment, increasing the number of epochs and data size increases the model's learning and thus the outcome. This study, on the other hand, regularly misclassifies racism and non-racism, with indirect racism inaccuracies accounting for 10.67% of the errors. According to the study's findings, the concept of indirect Amharic racism was more difficult to grasp than the concept of direct racism. In ge neral, a machine learning technique based on a binary dataset functions flawlessly for racism detection.

Show full item record