BDU IR

Dimensional Amharic Speech Emotion Recognition Model Using Deep Learning

Show simple item record

dc.contributor.author ASSEFA, BELAY AWEKE
dc.date.accessioned 2025-02-14T11:29:28Z
dc.date.available 2025-02-14T11:29:28Z
dc.date.issued 2024-11
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16439
dc.description.abstract Amharic language is the official language of Ethiopian and is spoken by millions of people within the country and outside the country. This is a more important language as it is the second most widely spoken Semitic language next to Arabic. Hence, developing dimensional SER is a promising work. Recently, researchers had researched emotion recognition based on a categorical approach. However, a categorical approach such as classifying emotion based on each class is unable to represent each emotion as emotion have more than 68 classes. The researcher also approves that dimensional emotion recognition can represent more nuanced than categorical emotion. Although dimensional emotion recognition is promising, the valence result is lower than arousal and dominance as the people use the same sound to describe their pleasure and displeasure. The researchers were challenged to recognize angry and happy emotions as humans speak the same way. To handle this challenge, the researcher uses different mechanisms such as linguistic and acoustic features. In this research, we annotated ASED dataset with VAD annotation by annotation team. This study aimed to conduct dimensional Amharic SER model to overcome the above problems. Furthermore, we used deep learning models such as LSTM and BiLSTM and identified the most suitable deep learning models to recognize Amharic emotions dimensionally. Our model performance on categorical speech emotion recognition is 98% and mean square error for dimension of valence, arousal, and dominance with bimodal feature is 0.0081, 0.0655, and 0.0239 respectively. The mean absolute error of valence, arousal, and dominance is 0.00049, 0.0321, and 0.0061 with concordance correlation coefficient of 1.000, 0.8738, and 0.9775 respectively. Keywords: Emotion Recognition, Amharic Language, VAD Annotation, Deep Learning, Dimensional Emotion Recognition, Speech Processing, Text Processing en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title Dimensional Amharic Speech Emotion Recognition Model Using Deep Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record