dc.description.abstract |
These days the increasing numbers of social media are enabling individuals to express their
respective ideas or views about different topics. People now can express their stance towards any
topics by commenting or tweeting on social media like Facebook, YouTube and Twitter. Stance
classification is a task to automatically determine whether the owner of a text is in support, against,
or neutral towards a topic or target. Stance classification has been researched for high-resource
languages like English. However, existing datasets and models for high-resource languages
cannot be applied for the Amharic due to variations in context, morphology and character
representation. As far as our knowledge is concerned, there is no research done on Amharic stance
classification. Stance classification needs a defined target or topic, to assess the overall attitude
toward the target or topic. In this study we use the approach of multi-task learning to build
Amharic stance classification model. Our model jointly learns sentiment classification, target
identification and stance classification tasks at the same time. We collect Amharic corpus from
social media by employing web-scraping. To prepare our dataset, we filter the collected corpus
using keywords to extract comments or tweets that more describe four selected targets or topics
(Abiy Ahmed, Green Legacy, Tigray War, and New Currency) of our study. For data annotation
we build an android stance annotator application that has cloud-based data storage. Using the
application, we annotate the filtered dataset with five annotators. For text representation, we build
a Word2Vec embedding model using the collected corpus. To build our model we use a
combination of the CNN and Bi-LSTM algorithm. We employ the CNN for text feature extraction
and stacked Bi-LSTM followed by fully connected layers as a classifier. Our model uses the
sentiment and target information as auxiliary tasks. We see that using sentiment information as
auxiliary task can improve performance of Amharic stance classification. From our experiment
we observe that Amharic sentiment and stance do not always align for the same comment or tweet.
We experiment with different deep learning algorithms for their performance on Amharic stance
classification. Our multi-task model shows an improved performance as compared to single task
deep learning models that needs multiple independent models for each task. Our model achieves
F1-score of 86% for task 1 (target identification), 65% for task 2 (stance classification) and 50%
for task 3 (sentiment classification).
Keywords: Amharic sentiment and stance classification, Amharic stance target identification,
multi-task learning |
en_US |