Abstract:
Sign language is an independent language that conveys meaning through gestures and body
language. People who have to use sign language to communicate with other people are
often unable to communicate effectively with people who do not know how to use sign
language. Therefore, an application to translate sign language to text would be useful for
many people. Several studies on sign language recognition for various sign languages have
been conducted all over the world. Since signs and linguistic features are different from
one sign language to another, an algorithm that recognizes one sign language may not be
applicable to another. For Ethiopian Sign Language (EthSL) recognition, different studies
have been conducted, but the majority of them are limited to the recognition of finger
spelling and isolated words. For continuous EthSL recognition, there is only one study
conducted. However, the study uses specialized equipment such as Kinect, and the
accuracy of the study is lower. To fill the gaps, we proposed a continuous EthSL
recognition model using a Bidirectional long-short term memory (BiLSTM) and
Connectionist Temporal Classification (CTC). This study uses video as input from a digital
camera. First, we collected a total of 420 sentence-level sign videos of 10 unique sentences.
After that the video frames are extracted, then the extracted frames are passed through a
sequence of different preprocessing activities, such as resizing, noise removal, and
segmentation. Following that, the frame's spatial features are extracted by using a
Convolutional Neural Network (CNN) and stored in a single .csv file. Finally, temporal
dependencies are modeled using BiLSTM. To get around temporal segments and achieve
end-to-end continuous EthSL recognition, we use CTC on top of the networks. We have
conducted three experiments in order to select the proposed model. From the experimental
results, we have got 33% and 38% Word Error Rate (WER) using Long Short-Term
Memory (LSTM) with CTC and Gated Recurrent Units (GRU) with CTC respectively. By
using BiLSTM-CTC we have got 32% WER. The experimental result illustrates that
BiLSTM-CTC achieves the highest recognition accuracy using BiLSTM-CTC.
Keywords: Convolutional Neural Network, Bidirectional Long Short-Term Memory,
Connectionist Temporal Classification, Continuous Ethiopian sign language recognition.