Abstract:
Amharic language is the official language of Ethiopian and is spoken by millions of people within the country and outside the country. This is a more important language as it is the second most widely spoken Semitic language next to Arabic. Hence, developing dimensional SER is a promising work. Recently, researchers had researched emotion recognition based on a categorical approach. However, a categorical approach such as classifying emotion based on each class is unable to represent each emotion as emotion have more than 68 classes. The researcher also approves that dimensional emotion recognition can represent more nuanced than categorical emotion. Although dimensional emotion recognition is promising, the valence result is lower than arousal and dominance as the people use the same sound to describe their pleasure and displeasure. The researchers were challenged to recognize angry and happy emotions as humans speak the same way. To handle this challenge, the researcher uses different mechanisms such as linguistic and acoustic features. In this research, we annotated ASED dataset with VAD annotation by annotation team. This study aimed to conduct dimensional Amharic SER model to overcome the above problems. Furthermore, we used deep learning models such as LSTM and BiLSTM and identified the most suitable deep learning models to recognize Amharic emotions dimensionally. Our model performance on categorical speech emotion recognition is 98% and mean square error for dimension of valence, arousal, and dominance with bimodal feature is 0.0081, 0.0655, and 0.0239 respectively. The mean absolute error of valence, arousal, and dominance is 0.00049, 0.0321, and 0.0061 with concordance correlation coefficient of 1.000, 0.8738, and 0.9775 respectively.
Keywords: Emotion Recognition, Amharic Language, VAD Annotation, Deep Learning, Dimensional Emotion Recognition, Speech Processing, Text Processing