Sarcasm Detection Model for Amharic Lemmatize Text Using Machine Learning

Biniyam, Damtie

dc.contributor.author	Biniyam, Damtie
dc.date.accessioned	2024-12-06T07:16:46Z
dc.date.available	2024-12-06T07:16:46Z
dc.date.issued	2023-06
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/16300
dc.description.abstract	By filling the real-world gaps using natural language processing (NLP) tasks with machine or computer by enabling the machine to perform human language related task by replacing human beings, it is possible to solve more than fifty percent of the world’s problem. NLP can be natural language tasks such as Natural language inference, semantic similarity measure, semantic analysis, semantic role labeling, and sentiment analysis and sarcasm detection. These tasks are focusing on understanding the meaning role of each word in the sentence. Opinion mining is one typical Natural language understanding task by focusing on identification of the polarity of the given sentence or document. However, it is very challenging due to the existence of sarcastic expressions. In this study, we have proposed a classic or shallow machine learning approaches-based sarcasm detection model for Amharic language. We would collect data from Abe Tokyo Amharic Shimut and Mitsetoch Book, his Face book channel, and other telegram and Face book channels focusing on sarcasm. We would annotate the collected dataset as sarcastic and nonsarcastic grouped sentences. After applying normalization, tokenization and non-Amharic components removal as a preprocessing step, we would use feature extraction i.e. inter and intra class frequency. We would implement threshold value-based feature selection i.e. minimum threshold value for intra class frequency and maximum threshold value for inter class frequency. At the end we would implement Artificial Neural Network-Nearest Neighbor, Support Vector Machine, Random Forest, AND Naïve Bayes as classifiers. In our experimental result ANN outperform traditional machine learning classifier we have achieved, 98.09% training accuracy and 94.05% testing accuracy using ANN. Machine learning algorithms do not process text as input and text encoding in another format. TFIDF was applied for vectoring the dataset or encoding text in numeric for traditional machine learning models. The goal of Sarcasm Detection is to determine whether a sentence is sarcastic or non-sarcastic. Key Words: Amharic Lemmatize Text, Sarcasm Detection Model	en_US
dc.language.iso	en_US	en_US
dc.subject	Information Technology	en_US
dc.title	Sarcasm Detection Model for Amharic Lemmatize Text Using Machine Learning	en_US
dc.type	Thesis	en_US