dc.description.abstract |
By filling the real-world gaps using natural language processing (NLP) tasks with
machine or computer by enabling the machine to perform human language related task by
replacing human beings, it is possible to solve more than fifty percent of the world’s
problem. NLP can be natural language tasks such as Natural language inference, semantic
similarity measure, semantic analysis, semantic role labeling, and sentiment analysis and
sarcasm detection. These tasks are focusing on understanding the meaning role of each
word in the sentence. Opinion mining is one typical Natural language understanding task
by focusing on identification of the polarity of the given sentence or document. However,
it is very challenging due to the existence of sarcastic expressions. In this study, we have
proposed a classic or shallow machine learning approaches-based sarcasm detection
model for Amharic language. We would collect data from Abe Tokyo Amharic Shimut
and Mitsetoch Book, his Face book channel, and other telegram and Face book channels
focusing on sarcasm. We would annotate the collected dataset as sarcastic and nonsarcastic
grouped sentences. After applying normalization, tokenization and non-Amharic
components removal as a preprocessing step, we would use feature extraction i.e. inter
and intra class frequency. We would implement threshold value-based feature selection
i.e. minimum threshold value for intra class frequency and maximum threshold value for
inter class frequency. At the end we would implement Artificial Neural Network-Nearest
Neighbor, Support Vector Machine, Random Forest, AND Naïve Bayes as classifiers. In
our experimental result ANN outperform traditional machine learning classifier we have
achieved, 98.09% training accuracy and 94.05% testing accuracy using ANN. Machine
learning algorithms do not process text as input and text encoding in another format.
TFIDF was applied for vectoring the dataset or encoding text in numeric for traditional
machine learning models. The goal of Sarcasm Detection is to determine whether a
sentence is sarcastic or non-sarcastic.
Key Words: Amharic Lemmatize Text, Sarcasm Detection Model |
en_US |