Abstract:
Ethiopian Sign Language (EthSL) is the primary means of communication for the deaf
community in Ethiopia. The dearth of accessible tools that convert Amharic speech into
EthSL, however, makes it difficult for hearing and deaf people to communicate. The goal
of this study is to create a deep learning model for translating Amharic speech to video sign
language. To close this gap, this study attempts to create a deep learning-based Amharic
speech-to-video sign language translation model. Four deep learning models were
compared: Long Short-Term Memory (LSTM) networks, Variational Autoencoders
(VAEs), Recurrent Neural Networks (RNNs), and Convolutional Neural Networks
(CNNs). Via hyperparameter optimization, LSTM achieved the highest accuracy,
increasing from 91% to 97%. In order to ensure model performance, interpretability, and
usability, the study used a design science methodology. To increase the model's decisionmaking
process's transparency, integrated gradients were used. A Flutter-built mobile
application including the trained LSTM model was evaluated for usability using surveys.
Results show that as compared to previous deep learning models, the LSTM-based model
considerably increases the accuracy of Amharic speech-to-EthSL translation.
The prototype improves accessibility for the deaf community by enabling real-time EthSL
video creation from Amharic speech. By offering a dependable and understandable
solution, encouraging inclusivity, and setting the stage for future developments in Amharic
speech-to-sign language translation, this research advances assistive technology.
For this thesis, four deep learning algorithms were investigated: Long Short-Term Memory
(LSTM) networks, Variational Autoencoders (VAEs), Recurrent Neural Networks
(RNNs), and Convolutional Neural Networks (CNNs). After hyperparameter optimization,
LSTM outperformed the others, increasing its accuracy from 91% to 97%.
Using a design science methodology, this study guarantees interpretability and usability in
addition to model performance. Transparency in Amharic speech-to-video sign language
translation was improved by using Integrated Gradients to explain the model's decision