Abstract:
Amharic is one of the official language of the Federal Democratic Republic of Ethiopia. There are
lots of historic Amharic and Ethiopic written documents addressing various relevant issues
including governance, science, religious, social rules, cultures and art works which are broad
indigenous knowledge. The written documents in Amharic have their unique racism-related ideas,
depending on the writer's preferences. Racism is currently the burning issue in Ethiopia. Racism
is not specifically tackled in Amharic Offline and Online textual, especially by employing
scientific studies and approaches. This study uses machine learning techniques to create a model
for identifying racist text in Amharic in both online and offline sources. The dataset is structured
using text from online and offline sources to create documents. The use of preprocessing and post
possessing helps to reduce unclear outcomes.
The study used an experimental methodology to tackle an issue by applying various features
extraction, algorithms, and models. Research was utilized to extract the TF-IDF feature, TF-IDF
combines n-grams and a bag of words. SVM, BILSTM, and RF were employed for classification.
Each experiment was analyzed using a 5-fold cross-validation method.
In this study, a dataset is used to apply a feature extraction technique for classification. The dataset
was divided into categories like racism and non-racism. The experiment based on SVM
classification and feature extraction using a bag of words achieves great performance to compare
RF classification. In this work, an experimental performance result of 89.33% on an Amharic
document was obtained. In the BILSTM experiment, increasing the number of epochs and data
size increases the model's learning and thus the outcome.
This study, on the other hand, regularly misclassifies racism and non-racism, with indirect racism
inaccuracies accounting for 10.67% of the errors. According to the study's findings, the concept of
indirect Amharic racism was more difficult to grasp than the concept of direct racism. In ge neral,
a machine learning technique based on a binary dataset functions flawlessly for racism detection.