BDU IR

End-to-End Continuous Ethiopia Sign Language Recognition

Show simple item record

dc.contributor.author Anteneh, Yehalem
dc.date.accessioned 2024-12-05T07:18:50Z
dc.date.available 2024-12-05T07:18:50Z
dc.date.issued 2024-07
dc.identifier.uri http://ir.bdu.edu.et/handle/123456789/16272
dc.description.abstract Sign language is a vital means of communication for the deaf communities, encompassing complex gestures and facial expressions to convey meaning. Recognizing sign language through automated systems presents a significant challenge due to its dynamic nature and the variability among individual signers. For effective communication, Ethiopian Sign Language (EthSL) relies on the accurate recognition of continuous sequences of signs. Previous research efforts have primarily concentrated on recognizing isolated signs at the character, word, and phrase levels. While these studies have been valuable, they fall short of capturing the full complexity and fluidity of natural sign language conversations, which occur at the continuous (sentence) level. Previous researchers have used traditional methods for EthSL recognition, typically decomposing the task into separate stages. The primary problem addressed in this study is the recognition of continuous EthSL in End-to-End approach, which involves understanding sequences of signs performed by multiple signers. This task is crucial for creating practical sign language recognition systems for real-world use, such as translation services and communication aids for the Deaf community. To address the challenge of continuous EthSL recognition, we created the first-ever dataset for Continuous (Sentence-Level) EthSL. This dataset features recordings of 30 unique sentences performed by 22 different signers. Each signer performed each sentence twice, resulting in a total of 1,320 sentences. Our methodology uses an end-to-end approach that employs a 2D convolution to extract spatial features from individual video frames, followed by a 1D convolution to capture short-term motion details. These spatial and temporal features are then processed using a Bidirectional Long Short-Term Memory (BiLSTM) network to recognize long-term temporal patterns and the sequential nature of sign language. Finally, a classifier with Connectionist Temporal Classification (CTC) is used to align and recognize the continuous sign language sequences. This integrated approach effectively harnesses both spatial and temporal information to improve recognition accuracy. The system evaluated on test dataset considering signer independence and unseen sentences split, achieving Word Error Rates (WER) of 8.02% and 47.02%, respectively. These results highlight the method’s robustness in handling variability among different signers in unseen signers, showcasing its strength in signer independence. Achieving a WER of 47.02% for unseen sentences split is a notable result, but it also indicates a substantial area for improvement. To address this, future research should focus on expanding the dataset to include a wider variety of sentences, potentially through collaborative data collection efforts. Overall, the study presents a significant advancement in continuous EthSL recognition, with a robust approach to signer independence but also highlights the need for further efforts to enhance performance on unseen sentence data. KeyWords: Ethiopian Sign Language; Sign Language Recognition; Deep Learning; Continuous SLR en_US
dc.language.iso en_US en_US
dc.subject Computer Science en_US
dc.title End-to-End Continuous Ethiopia Sign Language Recognition en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record