Abstract:
Amharic is the second most spoken Semitic language in Sub-Saharan Africa, with 65 million speakers,
after Arabic, which has over 300 million speakers. On some Facebook pages, some of the content
shows hate speech. Hate speech generally refers to expressions, speech, gestures, or writing that
advocate, threaten, or encourage violent acts towards someone based on gender, religion, political
view, or disability. In recent years, social activities over the internet, especially on Facebook
platforms, have increased dramatically. But unfortunately, social media like Facebook have evolved
into platforms for the dissemination of hate speech, which is disrupting the social lives of the majority
of people and leading to conflicts. As a solution to this problem, this research develops an Amharic
hate speech detection model using deep learning algorithms. In this study, new Amharic hate speech
datasets were prepared from Facebook, Twitter, and YouTube. These social media groups and
individual channels have been chosen to collect the dataset. This experiment used a total of 113,959
out of 308,160 posts and comments to train and test the collected dataset. Embedding layers using
Keras are used as a feature extraction for the deep learning models. Those models are Long Short
Term Memory (LSTM), Bidirectional Long Short Term Memory (BILSTM), Gated Recurrent Unit
(GRU), and Multilayer Preceptor (MLP). The experiment was conducted on those four models by
using 80% of the dataset for training and 20% of the dataset for testing the model after training. As a
result, performance evaluation with the use of precision, recall, and F-measure the above-mentioned
experiments were put into the evaluation, and they have shown a promising result. Each model has
been experimented with and tested individually. In the experiments conducted, BILSTM and GRU
achieved the highest accuracy (91%), and also LSTM and MLP achieved 90% accuracy. During the
experiment, one of the challenges was scrapping the dataset and labeling the scraped comments. But
by overcoming all the challenges, it was possible to detect Amharic hate speech and have a better
performance.
Key words: - Amharic language, Hate Speech, Classification, Word Embedding, Deep Learning