End-to-End Continuous Ethiopia Sign Language Recognition

Anteneh, Yehalem

dc.contributor.author	Anteneh, Yehalem
dc.date.accessioned	2024-12-05T07:18:50Z
dc.date.available	2024-12-05T07:18:50Z
dc.date.issued	2024-07
dc.identifier.uri	http://ir.bdu.edu.et/handle/123456789/16272
dc.description.abstract	Sign language is a vital means of communication for the deaf communities, encompassing complex gestures and facial expressions to convey meaning. Recognizing sign language through automated systems presents a significant challenge due to its dynamic nature and the variability among individual signers. For effective communication, Ethiopian Sign Language (EthSL) relies on the accurate recognition of continuous sequences of signs. Previous research efforts have primarily concentrated on recognizing isolated signs at the character, word, and phrase levels. While these studies have been valuable, they fall short of capturing the full complexity and fluidity of natural sign language conversations, which occur at the continuous (sentence) level. Previous researchers have used traditional methods for EthSL recognition, typically decomposing the task into separate stages. The primary problem addressed in this study is the recognition of continuous EthSL in End-to-End approach, which involves understanding sequences of signs performed by multiple signers. This task is crucial for creating practical sign language recognition systems for real-world use, such as translation services and communication aids for the Deaf community. To address the challenge of continuous EthSL recognition, we created the first-ever dataset for Continuous (Sentence-Level) EthSL. This dataset features recordings of 30 unique sentences performed by 22 different signers. Each signer performed each sentence twice, resulting in a total of 1,320 sentences. Our methodology uses an end-to-end approach that employs a 2D convolution to extract spatial features from individual video frames, followed by a 1D convolution to capture short-term motion details. These spatial and temporal features are then processed using a Bidirectional Long Short-Term Memory (BiLSTM) network to recognize long-term temporal patterns and the sequential nature of sign language. Finally, a classifier with Connectionist Temporal Classification (CTC) is used to align and recognize the continuous sign language sequences. This integrated approach effectively harnesses both spatial and temporal information to improve recognition accuracy. The system evaluated on test dataset considering signer independence and unseen sentences split, achieving Word Error Rates (WER) of 8.02% and 47.02%, respectively. These results highlight the method’s robustness in handling variability among different signers in unseen signers, showcasing its strength in signer independence. Achieving a WER of 47.02% for unseen sentences split is a notable result, but it also indicates a substantial area for improvement. To address this, future research should focus on expanding the dataset to include a wider variety of sentences, potentially through collaborative data collection efforts. Overall, the study presents a significant advancement in continuous EthSL recognition, with a robust approach to signer independence but also highlights the need for further efforts to enhance performance on unseen sentence data. KeyWords: Ethiopian Sign Language; Sign Language Recognition; Deep Learning; Continuous SLR	en_US
dc.language.iso	en_US	en_US
dc.subject	Computer Science	en_US
dc.title	End-to-End Continuous Ethiopia Sign Language Recognition	en_US
dc.type	Thesis	en_US